In [1]:
import pandas as pd

# **Extra Tips**
This notebook will be a mix of extra useful functionalities that are good to know to work more effectively.
- .str .dt methods
- pd.set/reset_option (search from pd.describe_option())
- styling tables
- MultiIndex

## String and DateTime methods
Pandas provides a way to use string and datetime methods on Series as if it was a single element:
- To use string methods use `.str` between the Series and the method.
- To use datetime methods use `.dt` between the Series and the method.

This is very handy when wanting to create a derived column.

In [2]:
df = pd.DataFrame({
    'mydate': pd.date_range(start='2020-01-01', periods=6, freq='6H'), 
    'mystr':[f'string_{i}' for i in range(6)],
})

df

Unnamed: 0,mydate,mystr
0,2020-01-01 00:00:00,string_0
1,2020-01-01 06:00:00,string_1
2,2020-01-01 12:00:00,string_2
3,2020-01-01 18:00:00,string_3
4,2020-01-02 00:00:00,string_4
5,2020-01-02 06:00:00,string_5


In [3]:
df['item_id'] = df['mystr'].str.replace('string_', '#')
df['hour'] = df['mydate'].dt.hour
df['day'] = df['mydate'].dt.day

df

Unnamed: 0,mydate,mystr,item_id,hour,day
0,2020-01-01 00:00:00,string_0,#0,0,1
1,2020-01-01 06:00:00,string_1,#1,6,1
2,2020-01-01 12:00:00,string_2,#2,12,1
3,2020-01-01 18:00:00,string_3,#3,18,1
4,2020-01-02 00:00:00,string_4,#4,0,2
5,2020-01-02 06:00:00,string_5,#5,6,2


### ***EXERCISE 9.1***
Using the `df` provided below, get the mean score of people whose name stats with 'J'

In [4]:
df = pd.DataFrame({
    'name': ['John', 'Albert', 'Jack', 'Josef', 'Bob', 'Juliette', 'Mary', 'Jane'], 
    'score': [5,8,6,4,8,7,3,5]
})
# insert solution here

In [5]:
df.loc[df['name'].str.contains('J'), 'score'].mean()

5.4

## Pandas Options
The default pandas options can be changed and reset using the convenient `set_option` and `reset_option` functions.

For example, pandas by default will show 60 rows before starting to hide some of them with "...". We can increase or decrease the number as needed and reset it.

In [6]:
print('pandas default number of rows to display:', pd.get_option('display.max_rows'))

df = pd.DataFrame({'mycol': range(8)})+10
df

pandas default number of rows to display: 60


Unnamed: 0,mycol
0,10
1,11
2,12
3,13
4,14
5,15
6,16
7,17


In [7]:
print('Overwriting to show 4 at most..')
pd.set_option('display.max_rows', 4)

df

Overwriting to show 4 at most..


Unnamed: 0,mycol
0,10
1,11
...,...
6,16
7,17


In [8]:
print('Resetting to the default 60')
pd.reset_option('display.max_rows')

df

Resetting to the default 60


Unnamed: 0,mycol
0,10
1,11
2,12
3,13
4,14
5,15
6,16
7,17
