## What is the Mechanism of Action (MoA) of a drug? And why is it important?

In the past, scientists derived drugs from natural products or were inspired by traditional remedies. Very common drugs, such as paracetamol, known in the US as acetaminophen, were put into clinical use decades before the biological mechanisms driving their pharmacological activities were understood. Today, with the advent of more powerful technologies, drug discovery has changed from the serendipitous approaches of the past to a more targeted model based on an understanding of the underlying biological mechanism of a disease. In this new framework, scientists seek to identify a protein target associated with a disease and develop a molecule that can modulate that protein target. As a shorthand to describe the biological activity of a given molecule, scientists assign a label referred to as mechanism-of-action or MoA for short.

## How do we determine the MoAs of a new drug?

One approach is to treat a sample of human cells with the drug and then analyze the cellular responses with algorithms that search for similarity to known patterns in large genomic databases, such as libraries of gene expression or cell viability patterns of drugs with known MoAs.

# Imports

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
train_features = pd.read_csv('/kaggle/input/lish-moa/train_features.csv')
test_features = pd.read_csv('/kaggle/input/lish-moa/test_features.csv')
train_targets_scored = pd.read_csv('/kaggle/input/lish-moa/train_targets_scored.csv')
train_targets_nonscored = pd.read_csv('/kaggle/input/lish-moa/train_targets_nonscored.csv')

# What does the data look like?

In [None]:
train_features.head(5)

In [None]:
train_features.describe()

In [None]:
train_features.dtypes

In [None]:
test_features.head(5)

In [None]:
test_features.describe()

In [None]:
train_features.dtypes

In [None]:
train_targets_scored.head(5)

In [None]:
train_targets_scored.shape

In [None]:
train_targets_scored.describe()

In [None]:
train_targets_scored.dtypes

In [None]:
train_targets_nonscored.head(5)

In [None]:
train_targets_nonscored.shape

In [None]:
train_targets_nonscored.describe()

In [None]:
train_targets_nonscored.dtypes

# Checking for null values

In [None]:
train_features.isnull().sum()

In [None]:
test_features.isnull().sum()

In [None]:
train_targets_scored.isnull().sum()

In [None]:
train_targets_nonscored.isnull().sum()

## Number of Gene expression columns

In [None]:
train_features.columns.str.startswith('g-').sum()

## Number of Cell viability columns

In [None]:
train_features.columns.str.startswith('c-').sum()

## Distribution

In [None]:
plt.figure(figsize=(16, 16))
cols = [
    'c-1', 'c-2', 'c-3', 'c-4',
    'c-5', 'c-6', 'c-7', 'c-8',
    'c-92', 'c-93', 'c-94', 'c-95', 
    'c-96', 'c-97', 'c-98', 'c-99']
for i, col in enumerate(cols):
    plt.subplot(4, 4, i + 1)
    plt.hist(train_features.loc[:, col], bins=100, alpha=1,color='#66bfbf');
    plt.title(col)

In [None]:
plt.figure(figsize=(16, 16))
cols = [
    'g-1', 'g-2', 'g-3', 'g-4',
    'g-5', 'g-6', 'g-7', 'g-8',
    'g-92', 'g-93', 'g-94', 'g-95', 
    'g-96', 'g-97', 'g-98', 'g-99']
for i, col in enumerate(cols):
    plt.subplot(4, 4, i + 1)
    plt.hist(train_features.loc[:, col], bins=100, alpha=1,color='#66bfbf');
    plt.title(col)