In [1]:
# Python Programming Examples: https://www.geeksforgeeks.org/python-programming-examples/
# An important guide for programing with python: https://learn.datacamp.com/career-tracks/data-scientist-with-python
# Data science Handbook including examples in colab: https://jakevdp.github.io/PythonDataScienceHandbook/
# Dataframe codebook: https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#cookbook
# Recommended machine learning blog by Jason Brownlee: https://machinelearningmastery.com/start-here/


In today's notebook, we will make sure that you can control some basic functionality of python.




# Pandas

In [2]:
# create toy dataFrame from a dictionary 
import pandas as pd
 
# create data of lists.
data = {'Name':['Johnathan', 'Amit', 'Rachel','Joe','Jennifer'],
        'Age':[20, 31, 29,32,28]}
 
# create a DataFrame
df = pd.DataFrame(data)
 
# print the output.
print(df)


        Name  Age
0  Johnathan   20
1       Amit   31
2     Rachel   29
3        Joe   32
4   Jennifer   28


In [5]:
# display all records for pepole over 30 years old
df[df['Age']>30]

Unnamed: 0,Name,Age
1,Amit,31
3,Joe,32


In [6]:
# make a copy of the dataFrame and change the names to lower case, print both dataframes to see the differences.

In [7]:
df2 = df.copy()
df2['Name'] = df2['Name'].str.lower()
print(df2)
print(df)

        Name  Age
0  johnathan   20
1       amit   31
2     rachel   29
3        joe   32
4   jennifer   28
        Name  Age
0  Johnathan   20
1       Amit   31
2     Rachel   29
3        Joe   32
4   Jennifer   28


In [11]:
# Add new column to the dataFrame 
# Declare a list that is to be converted into a column 
address = ['Dallas', 'Paris', 'NY', 'Turin',None] 
  
# Using 'Address' as the column name 
# and equating it to the list 
df['Address'] = address 
  
# Observe the result 
df 

Unnamed: 0,Name,Age,Address
0,Johnathan,20,Dallas
1,Amit,31,Paris
2,Rachel,29,NY
3,Joe,32,Turin
4,Jennifer,28,


In [12]:
# fill None value with the value: Unkown, use inplace == True. Inplace argument 

When inplace = True, the data is modified in place, which means it will return nothing, and the dataframe is now updated.

When inplace = False, which is the default, then the operation is performed, and it returns a copy of the object. You then need to save it to something

In [13]:
df['Address'].fillna('Unkown',inplace=True)

In [14]:
df

Unnamed: 0,Name,Age,Address
0,Johnathan,20,Dallas
1,Amit,31,Paris
2,Rachel,29,NY
3,Joe,32,Turin
4,Jennifer,28,Unkown


In [15]:
# How to read and write to files with pandas: https://realpython.com/pandas-read-write-files/#:~:text=Pandas%20is%20a%20powerful%20and,many%20other%20types%20of%20files.

# Load Data

Let's load the pokemon dataset from the Kaggle competition. [Pokemon dataset](https://www.kaggle.com/rounakbanik/pokemon?select=pokemon.csv) You can either download it locally to your PC and then upload it to collab. Or, use Kaggle to download it directly into your collab: [Easiest way to download kaggle data in Google Colab](https://www.kaggle.com/general/74235).

Alternatively, you can connect this notebook to your 'Google-Drive'. Make sure you downloaded the dataset to your 'Google-Drive', and you create a corresponding path.

In [17]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [23]:
#Make sure you downloaded the dataset and you create such a path, alternativly - changed this path
import os
PATH = '/content/drive/My Drive/Recommender Systems/Introduction'
os.chdir(PATH)

In [32]:
# For demonstration lets load only the 'name' type1' and 'type2' columns and display the first 10 rows of this selected dataset.
df = pd.read_csv('pokemon.csv',usecols=['name','type1','type2'])
df.head(10)

Unnamed: 0,name,type1,type2
0,Bulbasaur,grass,poison
1,Ivysaur,grass,poison
2,Venusaur,grass,poison
3,Charmander,fire,
4,Charmeleon,fire,
5,Charizard,fire,flying
6,Squirtle,water,
7,Wartortle,water,
8,Blastoise,water,
9,Caterpie,bug,


In [33]:
# lets print the rows of indexes 4 till 8 (including 8) and save those records in a new dataframe: 'df2' 

In [34]:
df2 = df.loc[4:8]
print(df2)

         name  type1   type2
4  Charmeleon   fire     NaN
5   Charizard   fire  flying
6    Squirtle  water     NaN
7   Wartortle  water     NaN
8   Blastoise  water     NaN


In [35]:
#We will use the name of the pokemon as the index of this dataframe

In [37]:
df2.set_index('name',inplace=True)

In [38]:
# Notably, our index is a string. 
df2

Unnamed: 0_level_0,type1,type2
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Charmeleon,fire,
Charizard,fire,flying
Squirtle,water,
Wartortle,water,
Blastoise,water,


In [40]:
# Let's filter all the pokemon starting with Ch from df2
df2.filter(like='Ch', axis=0)

Unnamed: 0_level_0,type1,type2
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Charmeleon,fire,
Charizard,fire,flying
