### <span style="color:black"><b>Pandas Tutorial 3</b></span>

---

<ins>Selecting, creating & renaming columns</ins>

* In this tutorial you'll see how you can select, create and rename columns using the pandas library
* Basic knowledge on python [dictionaries](https://www.w3schools.com/python/python_dictionaries.asp) is useful to know in order to get the most out of this
* [fstrings](https://realpython.com/python-f-strings/) and [list comprehensions](https://www.programiz.com/python-programming/list-comprehension) are a bonus but it isn't essential
* I'll also introduce the at times controversial `inplace` argument

Useful dataframe methods:
<pre>
df.select_dtypes()
df.rename(columns = ..., inplace = True)
</pre>

In [None]:
import pandas as pd

In [None]:
# Recap of the python dictionary
mydict = {'Name':'Noah', 'Age':19}
mydict['Age']

In [None]:
# Read in data
df = pd.read_csv('twitch.csv')

# First five rows
df.head()

<span style="color:black"><b>Selecting A Pandas Series (Column)</b></span> 


Exercise: Select a column in two different ways, outlining the advantages and disadvantages of each method of doing so

In [None]:
# Method 1: Use dictionary syntax
df['Peak viewers']

In [None]:
# Method 2: Use dot syntax
df.Channel

In [None]:
# Example 1 of where dot syntax fails 
# df.T 
df.T

# Reason: it clashes with pandas functionality of doing a matrix transpose

In [None]:
# Example 2 of where dot syntax fails 
df.Peak viewers
# Reason: The name of the series has whitespace between the words

 <span style="color:black"><b>Selecting Multiple Columns</b></span>

It turns out that when selecting multiple colums, it is best to pass in the names of the columns as strings inside a python list.

Exercise: Select just the 'Channel', 'Followers gained' and 'Language' columns

In [None]:
df[['Channel', 'Followers gained', 'Language']]

<span style="color:black"><b>Selecting Only Numeric or String Columns</b></span> 

* For this we use the `select_dtypes()` dataframe method that allows us to specify that we want numeric column or those that are text
* Pass in a dtype for the `include` argument. A list with multiple dtypes is allowed too
* Documentation for `select_dtypes()` can be found [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html) 

In [None]:
# Numeric only
df.select_dtypes(include = 'number')

In [None]:
# Text/object columns only 
df.select_dtypes(include = 'object')

<span style="color:black"><b>Creating New Columns</b></span> 

* Creating new columns is not a tedious nor difficult task when working with pandas dataframes
* In fact, when creating a new column, we can use our dictionary syntax (dot syntax not allowed here) as if the column was already in our dataset to start off with

Exercise: Create a new column named 'Testing' that has the value of 99.9 (just to see how this can be done)

In [None]:
# Solution:
df['Testing'] = 99.9
df.head()

Exercise: Create a new pandas series called 'Followers gained/Views gained ratio' where we take Followers gained/Views gained and create a ratio

In [None]:
# Solution:
df['followers/views ratio'] = df['Followers gained'] / df['Views gained']
df.head()

Exercise: Create a new pandas series called 'followers gained bigger than views gained???' that returns True if yes, otherwise False

In [None]:
# Solution
df['followers bigger than views gained???'] = df['Followers gained'] > df['Views gained']

In [None]:
# See if the changes worked
df.head()

Extension exercise: Create a new pandas series called 'description' that gives a summary of the channel

* The first row shoukld say 'xQcOW' is a(n) English channel'
* The second row should say that 'summit1g is a(n) English channel'
* The third row should say that 'Gaules is a(n) Portuguese channel'

Prerequisites:
* Knowledge about list comprehensions and `zip()`

OR

* Knowledge of the pandas `df.apply()` method

In [None]:
# One line solution
df['description'] = [f"{ch} is a(n) {lang} channel" for ch, lang in zip(df.Channel, df.Language)]
df.head()

<span style="color:black"><b>Renaming Columns</b></span> 

A very common way to rename columns is to use the `df.rename()` command from pandas

Exercise: Rename the pandas series 'Channel' to 'Account Name' and rename 'Langauge' to 'Nationality'

* For this we need to use the [rename](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html) dataframe method from pandas 
* When using the `columns` argument, The key of the dictionary is the old series name and the value is the one you want to change it to. Easy!


In [None]:
# Solution
df.rename(columns = {'Channel':'Account Name', 'Language':'Nationality'}, inplace = True)

In [None]:
# Check the changes
df.head()

* The reason the change wasn't actually implemented (the first time) was because we did not set a special argument called the `inplace = True` argument.
* The idea is that pandas does not want to make any permanent changes without us being fully aware of it
* So by not seting `inplace = True`, it only gives us a preview of what the change would look like, but doesn't actually make the change
* If we are sure that our change is the one we want to make, we must set the `inplace` argument to `True` and hence commit to the changes