### Introduction to NLP
NLP (Natural Language Processing), is a subfield of AI ( Artificial Intelligence), Which basically teaches computers how to understand and process human language. It involves creating an application software that can understand, recognize and analyze audio and text data expressed via natural language. Some common applications of NLP are Sentiment analysis, text categorization, speech recognition and language translation fall under this category.


NLP enables more natural interactions with the humans and computers, this is useful in most industries for example customer service and language learning. <br> It also used in the analysis of data and research because of its ability to observe and draw conclusions from humhumongous amounts of data. NLP is a developing field that uses a lot of other techniques like deep learning and machine learning.

### Importing Libraries
The keyword 'library' is used to import an external library into the current environment.
Importing a library is simple, use the keyword `import` followed by the library name.
Additionally, we can use the `as` keyword to give the library name an alias.

Execute the cell below to import the numpy and pandas libraries.
Note that we have used `np` and `pd` as the aliases for the libraries to reduce typing effort in subsequent cells.

In [14]:
import numpy as np
import pandas as pd

### Creating a Data Frame
The pandas library has built-in methods for creating a Data Frame - a tabular representation of data, with rows as observations and columns as attributes.
The `DataFrame` method takes multiple arguments.
Take the below example for instance, where we have passed 3 parameters:
1. The data for the table (generated using the random module from numpy).
2. The row names.
3. The column names.

In [15]:
# Creating a data frame
df = pd.DataFrame(np.random.randn(5, 4), ['A', 'B', 'C', 'D', 'E'], ['W', 'X', 'Y', 'Z'])
# Printing the dataframe
df

Unnamed: 0,W,X,Y,Z
A,0.012005,0.169241,1.663855,0.715747
B,-1.292557,0.190079,-0.635299,-0.195486
C,0.677509,0.269992,0.056831,1.237851
D,1.174516,0.596607,0.901035,0.8195
E,0.393833,1.512933,-1.164198,-0.554532


### Selecting columns
To select a specific column, we can use this notation: df[<column_name>]  
Execute the below cell to select the column "W"

In [16]:
# Grabbing a column
df["W"]

A    0.012005
B   -1.292557
C    0.677509
D    1.174516
E    0.393833
Name: W, dtype: float64

Another way of selecting a column is to use the dot notation, i.e., the data frame variable followed by a dot and the column name.

In [17]:
# Grabbing a column using dot notation
df.W

A    0.012005
B   -1.292557
C    0.677509
D    1.174516
E    0.393833
Name: W, dtype: float64

To select multiple columns, pass the column names as a list.  
Execute the below cell to select the columns and "W" and "Z".

In [18]:
# Grabbing multiple columns
df[["W", "Z"]]

Unnamed: 0,W,Z
A,0.012005,0.715747
B,-1.292557,-0.195486
C,0.677509,1.237851
D,1.174516,0.8195
E,0.393833,-0.554532


### Creating a new column
To create a new column, specify the new column name in square braces followed by the data.  
In the below cell, we create a new column called `"new"` which contains the sum of the values of the columns and `"W"` and `"Y"`. 

In [19]:
df["new"] = df["W"] + df["Y"]
df

Unnamed: 0,W,X,Y,Z,new
A,0.012005,0.169241,1.663855,0.715747,1.67586
B,-1.292557,0.190079,-0.635299,-0.195486,-1.927856
C,0.677509,0.269992,0.056831,1.237851,0.73434
D,1.174516,0.596607,0.901035,0.8195,2.075551
E,0.393833,1.512933,-1.164198,-0.554532,-0.770365


### Dropping a column
To drop a column, use the drop method with the argument `axis` set to `1`.  
The `inplace` argument ensures that the data frame is changed in place.

In [20]:
df.drop("new", axis=1, inplace=True)
df

Unnamed: 0,W,X,Y,Z
A,0.012005,0.169241,1.663855,0.715747
B,-1.292557,0.190079,-0.635299,-0.195486
C,0.677509,0.269992,0.056831,1.237851
D,1.174516,0.596607,0.901035,0.8195
E,0.393833,1.512933,-1.164198,-0.554532


### Dropping a row
To drop a row, we use the same method `drop`, but with the default value of the `axis` argument.

In [21]:
df.drop("E", inplace=True)
df

Unnamed: 0,W,X,Y,Z
A,0.012005,0.169241,1.663855,0.715747
B,-1.292557,0.190079,-0.635299,-0.195486
C,0.677509,0.269992,0.056831,1.237851
D,1.174516,0.596607,0.901035,0.8195


### Selecting a row
To select a row using its name, use the `loc` method followed by the name in square braces.  
Here, we select the row `"A"`.

In [22]:
df.loc["A"]

W    0.012005
X    0.169241
Y    1.663855
Z    0.715747
Name: A, dtype: float64

Another way to select a row is using the `iloc` method. This method uses the row index to fetch data.  
In this example, we fetch the third row (index starts at `0`).

In [23]:
df.iloc[2]

W    0.677509
X    0.269992
Y    0.056831
Z    1.237851
Name: C, dtype: float64

### Selecting a particular cell in the data frame
Use the `loc` method to fetch a particular cell. You need to pass both row name and column name like so:

In [24]:
df.loc["B","Y"]

-0.6352989034023595

### Selecting multiple rows and columns
Again, the `loc` function can be used to select multiple rows and columns at once.  
Just pass the row names and column names in separate lists.

In [25]:
df.loc[["A", "B"], ["W", "Y"]]

Unnamed: 0,W,Y
A,0.012005,1.663855
B,-1.292557,-0.635299


### Reading tabular data
For most practical applications, we will download and read existing data.  
To do that, `pandas` has a `read_csv` method to read csv files.  
In this example, we read the Titanic dataset.

In [26]:
titanic = pd.read_csv("titanic.csv")
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
