# NumPy

__NumPy__, short for numerical Python, is the core library for scientific computing in Python. It provides a high-performance __multidimensional array object__ and tools for working with these arrays. 

### Installing NumPy

https://numpy.org/install/

In [1]:
# conda install numpy
# OR
# pip install numpy

### Getting Started

NumPy is usually imported under the np alias.

In [2]:
import numpy as np 

### Creating an ndarray object

To create an ndarray, we can pass a list, tuple or any array-like object into the __array() function__, and it will be converted into an ndarray.

In [3]:
# create a 1-dimensional array with a list
a = np.array([1, 2, 3, 4])
# create a 1-dimensional array with a tuple
b = np.array((5, 6, 7, 8))

print(a)
print(type(a))

print(b)
print(type(b))

[1 2 3 4]
<class 'numpy.ndarray'>
[5 6 7 8]
<class 'numpy.ndarray'>


In [4]:
# 3-dimensional array
np.array([[1, 2, 5, 7], [2, 0, 4, 6], [3, 4, 5, 6]])

array([[1, 2, 5, 7],
       [2, 0, 4, 6],
       [3, 4, 5, 6]])

To create sequences of numbers, NumPy provides the __arange function.__

In [5]:
np.arange(12)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

__reshape()__ lets you alter the dimensions of an array.

In [6]:
# 2-d array
np.arange(12).reshape(2,6)

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [7]:
# 4-d array
np.arange(12).reshape(4,3)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

The __shape function__ returns the number of elements along each axis.

In [8]:
arr = np.arange(12).reshape(4,3)

arr.shape

(4, 3)

### Basic Operations

Arithmetic operators on arrays apply elementwise.

In [9]:
a = np.array([20,30,40,50])
b = np.arange(4)

print(a)
print(b)

[20 30 40 50]
[0 1 2 3]


In [10]:
#substraction
c = a-b

print(c)

[20 29 38 47]


In [11]:
# multiplication
d = b**2

print(d)

[0 1 4 9]


In [12]:
e = a**b

print(e)

[     1     30   1600 125000]


In [13]:
# boolean operation
f = e < 33

print(f)

[ True  True False False]


This should be enough to get you started. Numpy has numerous more functions that you can check out here: https://numpy.org/devdocs/user/quickstart.html

# Pandas

Pandas is a high-level data manipulation tool built on the Numpy package. Its key data structure, called the DataFrame, allows you to store and manipulate tabular data - much like a spread sheet.

### Installing pandas

https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html

In [14]:
# conda install pandas
# OR
# pip install pandas

### Getting Started

Pandas is usually imported under the pd alias.

In [15]:
import pandas as pd

### Creating a DataFrame object

In [16]:
# creating an empty dataframe
df = pd.DataFrame()

print(type(df))
df

<class 'pandas.core.frame.DataFrame'>


In [17]:
# from a numpy ndarray
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns = ['a', 'b', 'c'])

df2

Unnamed: 0,a,b,c
0,1,2,3
1,4,5,6
2,7,8,9


In [18]:
# from a dictionary

data = {'First Name':  ['Franziska', 'Mary Ann', 'Sohee', 'Sven'],
        'Last Name': ['Mack', 'Badavi', 'Cho', 'Travis'],
        }

df3 = pd.DataFrame(data, columns = ['First Name','Last Name'])

df3

Unnamed: 0,First Name,Last Name
0,Franziska,Mack
1,Mary Ann,Badavi
2,Sohee,Cho
3,Sven,Travis


In [19]:
# size of df3 (row, column)
df3.shape

(4, 2)

### Acessing the Dataframe

In [20]:
# accessing a column by name
list(df3['First Name'])

# we create a list for nicer representation

['Franziska', 'Mary Ann', 'Sohee', 'Sven']

In [21]:
# accessing a row by index with .loc
list(df3.loc[3])

['Sven', 'Travis']

In [22]:
# acessing a row with .values and index
print(df3.values[1])

['Mary Ann' 'Badavi']


In [23]:
# accessing a single cell with iloc
print(df3.iloc[1, 0])

Mary Ann


### Reading in files

In [24]:
import pandas as pd

file_path = './data/movie_reviews_sample.csv'

# reading a csv file into a pandas dataframe
df = pd.read_csv(file_path)

# looking at the first 5 rows
df.head(5)

Unnamed: 0,movie,user_name,review,date,rating
0,Toy Story,Quinoa1984,Toy Story is a sheer delight to view on the sc...,13 February 2000,10
1,Toy Story,alexkolokotronis,Toy Story is the film that started Pixar Anima...,3 February 2009,10
2,Toy Story,SmileysWorld,Though I am not a big fan of computer animatio...,30 December 2005,9
3,Toy Story,slokes,"Toy story was a fun, imaginative renewal of th...",10 May 2004,8
4,Toy Story,Lady_Targaryen,"When Toy Story came, in 1995,I was 9 years old...",10 November 2005,7


### Modifying the DataFrame

In [25]:
# drop column with index
df.drop(df.columns[1], axis=1, inplace=True)

# axis = 1 refers to column 
# inplace=True will replace the current dataframe with the updated one

df

Unnamed: 0,movie,review,date,rating
0,Toy Story,Toy Story is a sheer delight to view on the sc...,13 February 2000,10
1,Toy Story,Toy Story is the film that started Pixar Anima...,3 February 2009,10
2,Toy Story,Though I am not a big fan of computer animatio...,30 December 2005,9
3,Toy Story,"Toy story was a fun, imaginative renewal of th...",10 May 2004,8
4,Toy Story,"When Toy Story came, in 1995,I was 9 years old...",10 November 2005,7
5,Toy Story,A great movie turned into a franchise the tale...,19 July 2018,8
6,Finding Nemo,I'll be totally honest and confirm to you that...,10 December 2003,8
7,Finding Nemo,"The character ""Dory"" was depicted with symptom...",4 August 2003,3
8,Finding Nemo,"I will not say this film was excellent, and I ...",6 September 2015,6
9,"Monsters, Inc.","Pixar is the best! Of them all, Monsters, Inc....",16 January 2005,10


In [26]:
# drop columns by name
df.drop(['movie', 'date'], axis=1, inplace=True)

df

Unnamed: 0,review,rating
0,Toy Story is a sheer delight to view on the sc...,10
1,Toy Story is the film that started Pixar Anima...,10
2,Though I am not a big fan of computer animatio...,9
3,"Toy story was a fun, imaginative renewal of th...",8
4,"When Toy Story came, in 1995,I was 9 years old...",7
5,A great movie turned into a franchise the tale...,8
6,I'll be totally honest and confirm to you that...,8
7,"The character ""Dory"" was depicted with symptom...",3
8,"I will not say this film was excellent, and I ...",6
9,"Pixar is the best! Of them all, Monsters, Inc....",10


In [27]:
# drop row with index
df.drop([8], axis=0, inplace=True)

df

Unnamed: 0,review,rating
0,Toy Story is a sheer delight to view on the sc...,10
1,Toy Story is the film that started Pixar Anima...,10
2,Though I am not a big fan of computer animatio...,9
3,"Toy story was a fun, imaginative renewal of th...",8
4,"When Toy Story came, in 1995,I was 9 years old...",7
5,A great movie turned into a franchise the tale...,8
6,I'll be totally honest and confirm to you that...,8
7,"The character ""Dory"" was depicted with symptom...",3
9,"Pixar is the best! Of them all, Monsters, Inc....",10
10,"I only watched it on DVD, but I wish I was at ...",10


In [28]:
# drop row based on value 
df.drop(df.loc[(df['rating'] >= 3) & (df['rating'] <= 7)].index, inplace=True)

df

Unnamed: 0,review,rating
0,Toy Story is a sheer delight to view on the sc...,10
1,Toy Story is the film that started Pixar Anima...,10
2,Though I am not a big fan of computer animatio...,9
3,"Toy story was a fun, imaginative renewal of th...",8
5,A great movie turned into a franchise the tale...,8
6,I'll be totally honest and confirm to you that...,8
9,"Pixar is the best! Of them all, Monsters, Inc....",10
10,"I only watched it on DVD, but I wish I was at ...",10
12,This movie didn't do anything right; the comed...,1
13,This movie is often compared to Shrek. I found...,2


In [29]:
# reset the index
df.reset_index(drop=True, inplace=True)

df

Unnamed: 0,review,rating
0,Toy Story is a sheer delight to view on the sc...,10
1,Toy Story is the film that started Pixar Anima...,10
2,Though I am not a big fan of computer animatio...,9
3,"Toy story was a fun, imaginative renewal of th...",8
4,A great movie turned into a franchise the tale...,8
5,I'll be totally honest and confirm to you that...,8
6,"Pixar is the best! Of them all, Monsters, Inc....",10
7,"I only watched it on DVD, but I wish I was at ...",10
8,This movie didn't do anything right; the comed...,1
9,This movie is often compared to Shrek. I found...,2


In [30]:
# assign 'postive' or 'negative' sentiment to rating

# create empty list
sentiment = []

# iterate over ratings
for value in df['rating']:
    if value <= 4:
        sentiment.append('negative')
    elif value >= 6:
        sentiment.append('postive')
        
print(sentiment)

['postive', 'postive', 'postive', 'postive', 'postive', 'postive', 'postive', 'postive', 'negative', 'negative']


In [31]:
# add sentiment to dataframe as a new column 
df['sentiment'] = sentiment

# drop rating column
df.drop(['rating'], axis=1, inplace=True)

df

Unnamed: 0,review,sentiment
0,Toy Story is a sheer delight to view on the sc...,postive
1,Toy Story is the film that started Pixar Anima...,postive
2,Though I am not a big fan of computer animatio...,postive
3,"Toy story was a fun, imaginative renewal of th...",postive
4,A great movie turned into a franchise the tale...,postive
5,I'll be totally honest and confirm to you that...,postive
6,"Pixar is the best! Of them all, Monsters, Inc....",postive
7,"I only watched it on DVD, but I wish I was at ...",postive
8,This movie didn't do anything right; the comed...,negative
9,This movie is often compared to Shrek. I found...,negative


Check out the pandas user guide for more functionalities:
https://pandas.pydata.org/docs/user_guide/