<a href="https://colab.research.google.com/github/saad-ameer/Python-for-Data-Analyst/blob/main/Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Pandas Notes – Introduction

### * What is Pandas?

* Pandas is a powerful Python library for **data analysis** and **data manipulation**
* Built on top of NumPy
* Designed to work with **tabular or labeled data**

---

### * Key Features of Pandas

* Read and write data from multiple formats (CSV, Excel, SQL, etc.)
* Handle **large datasets** efficiently
* Support **label-based slicing**, indexing, and subsetting
* Detect and handle **missing data**
* Clean and preprocess data easily
* Insert/delete rows and columns
* Align data across multiple labels
* **Reshape** and **pivot** data
* Perform **aggregations** using `groupby()`
* Merge and join different datasets
* Work with **time series** data
* Perform **rolling/window** operations

---

### * Main Data Structures in Pandas

* `Series`: One-dimensional labeled array (like a column in a table)
  * Each element has a corresponding **index**
  * Similar to NumPy arrays but with labels

* `DataFrame`: Two-dimensional labeled data structure
  * Collection of `Series` aligned in rows and columns
  * Like a table in a spreadsheet or SQL

---

### * Why Use Pandas?

* Simple, readable syntax for complex data tasks
* Efficient handling of tabular data
* Ideal for real-world, practical data analysis

---

### * Summary

* Pandas is a high-level tool for handling real-world data
* Offers intuitive and powerful tools for loading, cleaning, manipulating, analyzing, and visualizing structured data
* Next: Learn about `Series` and `DataFrame` in detail

## Pandas Notes – Creating a Series

### * What is a Series?

* A `Series` is a one-dimensional labeled array in pandas
* Similar to NumPy arrays, but each value has an associated **index**
* Can hold different data types (e.g. int, float, string)

---

### * Syntax to Create a Series

```python
pd.Series(data, index, dtype, name)
```

* `data`: array-like (list, tuple, NumPy array, dict, or scalar)
* `index`: array-like (optional; if not given, auto-generated as 0, 1, 2, ...)
* `dtype`: data type of values (optional)
* `name`: optional label for the Series

---

### * Basic Examples

```python
import pandas as pd

data = [100, 200, 300, 400, 500]
index = [0, 1, 2, 3, 4]
s = pd.Series(data, index)
```

* You can access values using index labels:
  * `s[3]` → `400`
* You can also pass data as tuple or NumPy array

---

### * Using Custom Indexes

```python
months = ['Jan', 'Feb', 'Mar', 'Apr']
revenue = (1000, 1200, 1300, 900)
rev_series = pd.Series(revenue, index=months)
```

* Access with label:
  * `rev_series['Feb']` → `1200`
* Access multiple:
  * `rev_series[['Jan', 'Mar']]`

---

### * Data Type (dtype) Handling

* Pandas will infer the data type, or you can specify it:
  * `pd.Series(data, index, dtype='float')`
  * Long double: `dtype='g'`

* If values have mixed types, dtype becomes `object`
* Setting dtype to numeric when mixed types → Error

---

### * Auto Index and Empty Series

* If `index` not specified → auto index (0, 1, 2, ...)
* If only `index` given, data is filled with `NaN`

```python
pd.Series(index=['a', 'b', 'c'])
```

---

### * Creating Series from Dictionary

```python
d = {'a': 10, 'b': 20, 'c': 30}
pd.Series(d)
```

* Dictionary keys become index
* Dictionary values become data
* You can also specify the index explicitly to reorder or subset:
  * `pd.Series(d, index=['b', 'a'])`

---

### * Summary

* Series are like labeled NumPy arrays
* They support various data types and allow powerful indexing
* Can be created from lists, tuples, arrays, dictionaries, or scalars

In [492]:
import pandas as pd

In [493]:
index_list = [0,1,2,3,4,5]

In [494]:
data_list = [100,200,300,400,500,600]

In [495]:
series_1 = pd.Series(data=data_list, index=index_list)

In [496]:
series_1

Unnamed: 0,0
0,100
1,200
2,300
3,400
4,500
5,600


In [497]:
series_1[4]

np.int64(500)

In [498]:
series_1[[4]]

Unnamed: 0,0
4,500


In [499]:
month_index_2025 = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']

In [500]:
revenue_data_2025 = (1000000,2000000,3000000,4000000,5000000,6000000,7000000,8000000,9000000,10000000,11000000,12000000)

In [501]:
revenue_series = pd.Series(data=revenue_data_2025, index=month_index_2025)

In [502]:
revenue_series

Unnamed: 0,0
Jan,1000000
Feb,2000000
Mar,3000000
Apr,4000000
May,5000000
Jun,6000000
Jul,7000000
Aug,8000000
Sep,9000000
Oct,10000000


In [503]:
type(revenue_series)

In [504]:
revenue_series['Jul']

np.int64(7000000)

In [505]:
revenue_series[['Jul']]

Unnamed: 0,0
Jul,7000000


In [506]:
revenue_series[['Jul','Aug','Sep']]

Unnamed: 0,0
Jul,7000000
Aug,8000000
Sep,9000000


In [507]:
revenue_series = pd.Series(data=revenue_data_2025, index=month_index_2025,dtype='float')

In [508]:
revenue_series

Unnamed: 0,0
Jan,1000000.0
Feb,2000000.0
Mar,3000000.0
Apr,4000000.0
May,5000000.0
Jun,6000000.0
Jul,7000000.0
Aug,8000000.0
Sep,9000000.0
Oct,10000000.0


In [509]:
dict_1={'key1':'value1','key2':'value2','key3':'value3','key4':'value4'}

In [510]:
pd.Series(data=dict_1,index=dict_1)

Unnamed: 0,0
key1,value1
key2,value2
key3,value3
key4,value4


## Pandas Notes – DataFrames

### * What is a DataFrame?

* A DataFrame is a 2D labeled data structure in pandas (rows and columns)
* Think of it like a table or spreadsheet
* Internally, it's a collection of Series sharing the same index

---

### * Creating a DataFrame

```python
import pandas as pd

data = [
    [10, 20, 30],
    [40, 50, 60],
    [70, 80, 90],
    [100, 110, 120],
    [130, 140, 150]
]
columns = ['A', 'B', 'C']
index = [0, 1, 2, 3, 4]

df = pd.DataFrame(data, columns=columns, index=index)
```

* `data`: 2D array-like structure (lists, tuples, NumPy arrays)
* `columns`: list of column names (length must match number of columns)
* `index`: list of row labels (length must match number of rows)

---

### * Indexing with Columns

```python
df.set_index('C')
```

* Sets column `C` as the new index (drops `C` by default)
* To **keep** the column: `df.set_index('C', drop=False)`
* To make the change **in-place**: `df.set_index('C', inplace=True)`
* To **append** to existing index: `df.set_index('C', append=True)`

---

### * Resetting the Index

```python
df.reset_index()
```

* Resets the index to default (0, 1, 2, …)
* To **drop** the current index completely: `df.reset_index(drop=True)`
* To make the reset **in-place**: `df.reset_index(inplace=True)`

---

### * Creating DataFrame from Sets, Tuples

```python
data = [
    {1, 2, 3},
    (4, 5, 6),
    [7, 8, 9],
    [10, 11, 12],
    [13, 14, 15]
]
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
```

* Lists of mixed structures (sets, tuples, lists) work fine if shapes align

---

### * Reading CSV into DataFrame

```python
df = pd.read_csv('filename.csv')
```

* `pd.read_csv()` reads a CSV file into a DataFrame
* If CSV has no header row:
  * `pd.read_csv('file.csv', header=None)`
* If the file is in another folder, use full path:
  * `pd.read_csv('/Users/Name/Desktop/file.csv')`
* Common arguments:
  * `sep=','` → specify separator (default is comma)
  * `index_col=0` → use first column as index
  * `usecols=[0, 2]` or `usecols=['Country', 'Population']`

---

### * Summary

* DataFrames are 2D labeled tables, great for structured data
* Can be created from lists, sets, NumPy arrays, dictionaries, or CSV files
* `set_index()` and `reset_index()` let you control row labels
* Use `pd.read_csv()` to import data files as DataFrames

In [511]:
import pandas as pd

In [512]:
row_labels = ['a','b','c','d','e']
column_labels = ['A','B','C']
data = [[2,4,1],[5,3,1],[7,3,7],[2,6,9],[1,6,8]]

In [513]:
data = [[2,4,1],(5,3,1),{7,3,1},[2,6,9],[1,6,8]]

In [514]:
df =pd.DataFrame(data=data,index=row_labels,columns=column_labels)

In [515]:
df

Unnamed: 0,A,B,C
a,2,4,1
b,5,3,1
c,1,3,7
d,2,6,9
e,1,6,8


In [516]:
type(df)

In [517]:
df.set_index('A')

Unnamed: 0_level_0,B,C
A,Unnamed: 1_level_1,Unnamed: 2_level_1
2,4,1
5,3,1
1,3,7
2,6,9
1,6,8


In [518]:
df.set_index('A',drop=False)

Unnamed: 0_level_0,A,B,C
A,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,2,4,1
5,5,3,1
1,1,3,7
2,2,6,9
1,1,6,8


In [519]:
df

Unnamed: 0,A,B,C
a,2,4,1
b,5,3,1
c,1,3,7
d,2,6,9
e,1,6,8


In [520]:
df.set_index('A',inplace=True)

In [521]:
df

Unnamed: 0_level_0,B,C
A,Unnamed: 1_level_1,Unnamed: 2_level_1
2,4,1
5,3,1
1,3,7
2,6,9
1,6,8


In [522]:
df.set_index('B',drop=False,append=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,B,C
A,B,Unnamed: 2_level_1,Unnamed: 3_level_1
2,4,4,1
5,3,3,1
1,3,3,7
2,6,6,9
1,6,6,8


In [523]:
df.reset_index(inplace=True)

In [524]:
df

Unnamed: 0,A,B,C
0,2,4,1
1,5,3,1
2,1,3,7
3,2,6,9
4,1,6,8


In [525]:
countries = pd.read_csv(filepath_or_buffer='top_10_countries.csv')

In [526]:
countries

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,1,China,Asia,1412600000,17.80%,31-Dec-21
1,2,India,Asia,1386946912,17.50%,18-Jan-22
2,3,United States,Americas,333073186,4.20%,18-Jan-22
3,4,Indonesia[b],Asia,271350000,3.42%,31-Dec-20
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21
5,6,Brazil,Americas,214231641,2.70%,18-Jan-22
6,7,Nigeria,Africa,211401000,2.67%,01-Jul-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22
8,9,Russia[b],Europe,146171015,1.84%,01-Jan-21
9,10,Mexico,Americas,126014024,1.59%,02-Mar-20


In [527]:
countries = pd.read_csv('top_10_countries.csv')

In [528]:
countries

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,1,China,Asia,1412600000,17.80%,31-Dec-21
1,2,India,Asia,1386946912,17.50%,18-Jan-22
2,3,United States,Americas,333073186,4.20%,18-Jan-22
3,4,Indonesia[b],Asia,271350000,3.42%,31-Dec-20
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21
5,6,Brazil,Americas,214231641,2.70%,18-Jan-22
6,7,Nigeria,Africa,211401000,2.67%,01-Jul-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22
8,9,Russia[b],Europe,146171015,1.84%,01-Jan-21
9,10,Mexico,Americas,126014024,1.59%,02-Mar-20


In [529]:
type(countries)

In [530]:
countries2 = pd.read_csv('top_10_countries_no_header.csv', header=None)

In [531]:
countries2

Unnamed: 0,0,1,2,3,4,5
0,1,China,Asia,1412600000,17.80%,31-Dec-21
1,2,India,Asia,1386946912,17.50%,18-Jan-22
2,3,United States,Americas,333073186,4.20%,18-Jan-22
3,4,Indonesia[b],Asia,271350000,3.42%,31-Dec-20
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21
5,6,Brazil,Americas,214231641,2.70%,18-Jan-22
6,7,Nigeria,Africa,211401000,2.67%,01-Jul-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22
8,9,Russia[b],Europe,146171015,1.84%,01-Jan-21
9,10,Mexico,Americas,126014024,1.59%,02-Mar-20


## Pandas Notes – Selecting and Filtering DataFrames

### * Selecting Columns

```python
# Single column (returns a Series)
df['Region']

# Multiple columns (returns a DataFrame)
df[['Region', 'Country / Dependency', 'Rank']]
```

* Use single square brackets for one column  
* Use list inside double brackets for multiple columns  
* Columns are strings (quoted)

---

### * Selecting Rows with `.iloc` (Index-based)

```python
# Row at index 3
df.iloc[3]

# Rows from index 3 onwards
df.iloc[3:]

# Reverse all rows
df.iloc[::-1]

# Single value: population of row index 2, column index 3
df.iloc[2, 3]

# Rows from index 2 onward, columns index 1 to 3 (exclusive)
df.iloc[2:, 1:4]
```

* `.iloc` uses **integer positions**  
* Format: `df.iloc[rows, columns]`  
* Use colon `:` to slice

---

### * Selecting Rows/Columns with `.loc` (Label-based)

```python
# Row with label 2, column with label 'Country / Dependency'
df.loc[2, 'Country / Dependency']

# All rows, from column 'Country / Dependency' onwards
df.loc[:, 'Country / Dependency':]

# Select multiple named columns
df.loc[:, ['Population', 'Country / Dependency']]
```

* `.loc` uses **label names** (not integer indexes)  
* Format: `df.loc[rows, columns]`

---

### * Filtering Rows with Conditions

```python
# Where Region == 'Asia'
df[df['Region'] == 'Asia']

# Where Region == 'Asia' AND Population > 300_000_000
df[(df['Region'] == 'Asia') & (df['Population'] > 300_000_000)]

# Where Region == 'Asia' OR Population > 300_000_000
df[(df['Region'] == 'Asia') | (df['Population'] > 300_000_000)]
```

* Use `&` for AND and `|` for OR  
* Each condition must be in parentheses

---

### * Filtering Rows + Selecting Columns

```python
# Return Rank and Country/Dependency for filtered data
df[(df['Region'] == 'Asia') & (df['Population'] > 300_000_000)][['Rank', 'Country / Dependency']]
```

---

### * Quick Methods for Data Inspection

```python
df.head()  # First 5 rows
df.tail()  # Last 5 rows
```

* `head(n)` and `tail(n)` can take custom number `n`  
* Useful for previewing large datasets

---

### * Summary

* Use `df[col]` or `df[[col1, col2]]` for column selection  
* Use `.iloc[]` for index-based slicing, `.loc[]` for label-based slicing  
* Filter rows using boolean conditions inside brackets  
* Combine filters with `&` (and), `|` (or)  
* Use `head()` and `tail()` to inspect DataFrames efficiently

In [532]:
import pandas as pd

In [533]:
countries_data = pd.read_csv('top_10_countries.csv')

In [534]:
countries_data

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,1,China,Asia,1412600000,17.80%,31-Dec-21
1,2,India,Asia,1386946912,17.50%,18-Jan-22
2,3,United States,Americas,333073186,4.20%,18-Jan-22
3,4,Indonesia[b],Asia,271350000,3.42%,31-Dec-20
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21
5,6,Brazil,Americas,214231641,2.70%,18-Jan-22
6,7,Nigeria,Africa,211401000,2.67%,01-Jul-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22
8,9,Russia[b],Europe,146171015,1.84%,01-Jan-21
9,10,Mexico,Americas,126014024,1.59%,02-Mar-20


In [535]:
countries_data['Region']

Unnamed: 0,Region
0,Asia
1,Asia
2,Americas
3,Asia
4,Asia
5,Americas
6,Africa
7,Asia
8,Europe
9,Americas


In [536]:
type(countries_data['Region'])

In [537]:
countries_data[['Region','Country / Dependency','Population']]

Unnamed: 0,Region,Country / Dependency,Population
0,Asia,China,1412600000
1,Asia,India,1386946912
2,Americas,United States,333073186
3,Asia,Indonesia[b],271350000
4,Asia,Pakistan,225200000
5,Americas,Brazil,214231641
6,Africa,Nigeria,211401000
7,Asia,Bangladesh,172062576
8,Europe,Russia[b],146171015
9,Americas,Mexico,126014024


In [538]:
countries_data.iloc[3]

Unnamed: 0,3
Rank,4
Country / Dependency,Indonesia[b]
Region,Asia
Population,271350000
% of world,3.42%
Date,31-Dec-20


In [539]:
countries_data.iloc[4:]

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21
5,6,Brazil,Americas,214231641,2.70%,18-Jan-22
6,7,Nigeria,Africa,211401000,2.67%,01-Jul-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22
8,9,Russia[b],Europe,146171015,1.84%,01-Jan-21
9,10,Mexico,Americas,126014024,1.59%,02-Mar-20


In [540]:
countries_data.iloc[::-1]

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
9,10,Mexico,Americas,126014024,1.59%,02-Mar-20
8,9,Russia[b],Europe,146171015,1.84%,01-Jan-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22
6,7,Nigeria,Africa,211401000,2.67%,01-Jul-21
5,6,Brazil,Americas,214231641,2.70%,18-Jan-22
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21
3,4,Indonesia[b],Asia,271350000,3.42%,31-Dec-20
2,3,United States,Americas,333073186,4.20%,18-Jan-22
1,2,India,Asia,1386946912,17.50%,18-Jan-22
0,1,China,Asia,1412600000,17.80%,31-Dec-21


In [541]:
countries_data.iloc[0:5:2]

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,1,China,Asia,1412600000,17.80%,31-Dec-21
2,3,United States,Americas,333073186,4.20%,18-Jan-22
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21


In [542]:
countries_data.iloc[2,3]

'333,073,186'

In [543]:
countries_data.iloc[3:,1:4]

Unnamed: 0,Country / Dependency,Region,Population
3,Indonesia[b],Asia,271350000
4,Pakistan,Asia,225200000
5,Brazil,Americas,214231641
6,Nigeria,Africa,211401000
7,Bangladesh,Asia,172062576
8,Russia[b],Europe,146171015
9,Mexico,Americas,126014024


In [544]:
countries_data.loc[5,'Country / Dependency']

'\xa0Brazil'

In [545]:
countries_data.loc[3:,['Country / Dependency','Population']]

Unnamed: 0,Country / Dependency,Population
3,Indonesia[b],271350000
4,Pakistan,225200000
5,Brazil,214231641
6,Nigeria,211401000
7,Bangladesh,172062576
8,Russia[b],146171015
9,Mexico,126014024


In [546]:
countries_data=='Asia'

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,False,False,True,False,False,False
1,False,False,True,False,False,False
2,False,False,False,False,False,False
3,False,False,True,False,False,False
4,False,False,True,False,False,False
5,False,False,False,False,False,False
6,False,False,False,False,False,False
7,False,False,True,False,False,False
8,False,False,False,False,False,False
9,False,False,False,False,False,False


In [547]:
countries_data[countries_data=='Asia']

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,,,Asia,,,
1,,,Asia,,,
2,,,,,,
3,,,Asia,,,
4,,,Asia,,,
5,,,,,,
6,,,,,,
7,,,Asia,,,
8,,,,,,
9,,,,,,


In [548]:
countries_data['Region']=='Asia'

Unnamed: 0,Region
0,True
1,True
2,False
3,True
4,True
5,False
6,False
7,True
8,False
9,False


In [549]:
countries_data[countries_data['Region']=='Asia']

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,1,China,Asia,1412600000,17.80%,31-Dec-21
1,2,India,Asia,1386946912,17.50%,18-Jan-22
3,4,Indonesia[b],Asia,271350000,3.42%,31-Dec-20
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22


In [550]:
countries_data[
    (countries_data['Region'] == 'Asia') |
    (countries_data['Population'].str.replace(',', '').astype(float) > 300000000)
]

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,1,China,Asia,1412600000,17.80%,31-Dec-21
1,2,India,Asia,1386946912,17.50%,18-Jan-22
2,3,United States,Americas,333073186,4.20%,18-Jan-22
3,4,Indonesia[b],Asia,271350000,3.42%,31-Dec-20
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22


In [551]:
countries_data[
    (countries_data['Region']=='Asia') &
    (countries_data['Population'].str.replace(',','').astype(float)>300000000)
][['Region','Country / Dependency']]

Unnamed: 0,Region,Country / Dependency
0,Asia,China
1,Asia,India


In [552]:
countries_data.head()

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,1,China,Asia,1412600000,17.80%,31-Dec-21
1,2,India,Asia,1386946912,17.50%,18-Jan-22
2,3,United States,Americas,333073186,4.20%,18-Jan-22
3,4,Indonesia[b],Asia,271350000,3.42%,31-Dec-20
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21


In [553]:
countries_data.tail()

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
5,6,Brazil,Americas,214231641,2.70%,18-Jan-22
6,7,Nigeria,Africa,211401000,2.67%,01-Jul-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22
8,9,Russia[b],Europe,146171015,1.84%,01-Jan-21
9,10,Mexico,Americas,126014024,1.59%,02-Mar-20


In [554]:
countries_data.head(3)

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,1,China,Asia,1412600000,17.80%,31-Dec-21
1,2,India,Asia,1386946912,17.50%,18-Jan-22
2,3,United States,Americas,333073186,4.20%,18-Jan-22


## Pandas Notes – Conditional Selection with `isin()`

### * Select rows where a column matches any value from a list

```python
# Example: Select rows where Region is either 'Asia' or 'Americas'
df[df['Region'].isin(['Asia', 'Americas'])]
```

* `isin()` returns a Boolean Series  
* Can be used inside `df[...]` to filter rows  
* Useful for matching multiple values in a single column

---

### * Select rows where a column **does not** match values in a list

```python
# Example: Select rows where Region is NOT 'Asia' or 'Americas'
df[~df['Region'].isin(['Asia', 'Americas'])]
```

* Use `~` (bitwise NOT) to invert the condition  
* Returns rows where condition is **not** met

---

### * Summary

* `df['col'].isin([val1, val2, ...])` → matches any of the given values  
* `~df['col'].isin([...])` → matches none of the given values  
* Works well for filtering multiple categorical values

In [555]:
countries_data['Region'].isin(['Asia','Europe'])

Unnamed: 0,Region
0,True
1,True
2,False
3,True
4,True
5,False
6,False
7,True
8,True
9,False


In [556]:
countries_data[countries_data['Region'].isin(['Asia','Europe'])]

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
0,1,China,Asia,1412600000,17.80%,31-Dec-21
1,2,India,Asia,1386946912,17.50%,18-Jan-22
3,4,Indonesia[b],Asia,271350000,3.42%,31-Dec-20
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22
8,9,Russia[b],Europe,146171015,1.84%,01-Jan-21


In [557]:
countries_data[~countries_data['Region'].isin(['Asia','Europe'])]

Unnamed: 0,Rank,Country / Dependency,Region,Population,% of world,Date
2,3,United States,Americas,333073186,4.20%,18-Jan-22
5,6,Brazil,Americas,214231641,2.70%,18-Jan-22
6,7,Nigeria,Africa,211401000,2.67%,01-Jul-21
9,10,Mexico,Americas,126014024,1.59%,02-Mar-20


## Pandas Notes – Data Manipulation Techniques

### Basic Inspection

- `df.shape` → returns (rows, columns)
- `df.columns` → list of column names
- `df.index` → index range

---

### Rename Columns

```python
df.rename(columns={'old_name': 'new_name'}, inplace=True)
```

- Only specify columns you want to rename
- Use `inplace=True` to apply changes directly

---

### Drop Columns

```python
# Drop single or multiple columns
df.drop('col_name', axis=1, inplace=True)
df.drop(['col1', 'col2'], axis=1, inplace=True)
```

- Use `axis=1` for columns
- `inplace=True` to modify DataFrame directly

---

### Arithmetic on Columns

- You can perform arithmetic directly on numerical columns or with scalars

```python
df['population_millions'] = round(df['Population'] / 1_000_000, 2)
df['country_region'] = df['Country'] + ' - ' + df['Region']
```

- Arithmetic only works on compatible data types

---

### TypeError Example

```python
df['Country'] + df['Population']  # will throw error (string + int)
```

---

### Check Data Types

```python
df.dtypes
```

- Use `.astype()` to change data type

---

### Clean and Convert Object to Numeric

Problem: Percentage column stored as string with `%`  
Solution:

1. Remove `%` symbol
2. Convert to float

```python
# Using .apply() with custom function
def remove_percent(x):
    return x[:-1]

df['% of World'] = df['% of World'].apply(remove_percent)

# Or using lambda function
df['% of World'] = df['% of World'].apply(lambda x: x[:-1])
```

```python
# Convert to float
df['% of World'] = df['% of World'].astype(float)
```

---

### Calculate New Column

```python
# Compute world population
df['world_population'] = df['Population'] / (df['% of World'] / 100)
```

You can also break it into steps:

```python
x = df['Population']
y = df['% of World'] / 100
df['world_population'] = x / y
```

---

### Summary

- Use `.rename()`, `.drop()`, and column assignments to manipulate data
- Use `.apply()` or `lambda` to transform column values
- Use `.astype()` for type conversion
- Combine or derive new columns using arithmetic

```python
df['new_col'] = df['col1'] + df['col2']  # or any valid operation
```

---

In [558]:
import pandas as pd

In [559]:
countries_df = pd.read_csv('top_10_countries.csv')

In [560]:
countries_df.shape

(10, 6)

In [561]:
countries_df.columns

Index(['Rank', 'Country / Dependency', 'Region', 'Population', '% of world',
       'Date'],
      dtype='object')

In [562]:
countries_df.index

RangeIndex(start=0, stop=10, step=1)

In [563]:
countries_df.rename(columns={'Country / Dependency' : 'Country'}, inplace=True)

In [564]:
countries_df

Unnamed: 0,Rank,Country,Region,Population,% of world,Date
0,1,China,Asia,1412600000,17.80%,31-Dec-21
1,2,India,Asia,1386946912,17.50%,18-Jan-22
2,3,United States,Americas,333073186,4.20%,18-Jan-22
3,4,Indonesia[b],Asia,271350000,3.42%,31-Dec-20
4,5,Pakistan,Asia,225200000,2.84%,01-Jul-21
5,6,Brazil,Americas,214231641,2.70%,18-Jan-22
6,7,Nigeria,Africa,211401000,2.67%,01-Jul-21
7,8,Bangladesh,Asia,172062576,2.17%,18-Jan-22
8,9,Russia[b],Europe,146171015,1.84%,01-Jan-21
9,10,Mexico,Americas,126014024,1.59%,02-Mar-20


In [565]:
#countries_df.drop(labels='Date', axis=1, inplace=True)

In [566]:
countries_df.drop(columns='Date',inplace=True)

In [567]:
countries_df

Unnamed: 0,Rank,Country,Region,Population,% of world
0,1,China,Asia,1412600000,17.80%
1,2,India,Asia,1386946912,17.50%
2,3,United States,Americas,333073186,4.20%
3,4,Indonesia[b],Asia,271350000,3.42%
4,5,Pakistan,Asia,225200000,2.84%
5,6,Brazil,Americas,214231641,2.70%
6,7,Nigeria,Africa,211401000,2.67%
7,8,Bangladesh,Asia,172062576,2.17%
8,9,Russia[b],Europe,146171015,1.84%
9,10,Mexico,Americas,126014024,1.59%


In [568]:
countries_df['Population'].str.replace(',','').astype('float')/10000000

Unnamed: 0,Population
0,141.26
1,138.694691
2,33.307319
3,27.135
4,22.52
5,21.423164
6,21.1401
7,17.206258
8,14.617102
9,12.601402


In [569]:
round(countries_df['Population'].str.replace(',','').astype(float)/1000000,2)

Unnamed: 0,Population
0,1412.6
1,1386.95
2,333.07
3,271.35
4,225.2
5,214.23
6,211.4
7,172.06
8,146.17
9,126.01


In [570]:
countries_df['Population (millions)'] = round(countries_df['Population'].str.replace(',','').astype(float)/1000000,2)

In [571]:
countries_df

Unnamed: 0,Rank,Country,Region,Population,% of world,Population (millions)
0,1,China,Asia,1412600000,17.80%,1412.6
1,2,India,Asia,1386946912,17.50%,1386.95
2,3,United States,Americas,333073186,4.20%,333.07
3,4,Indonesia[b],Asia,271350000,3.42%,271.35
4,5,Pakistan,Asia,225200000,2.84%,225.2
5,6,Brazil,Americas,214231641,2.70%,214.23
6,7,Nigeria,Africa,211401000,2.67%,211.4
7,8,Bangladesh,Asia,172062576,2.17%,172.06
8,9,Russia[b],Europe,146171015,1.84%,146.17
9,10,Mexico,Americas,126014024,1.59%,126.01


In [572]:
countries_df['Country / Region'] = countries_df['Country'] + ' / ' + countries_df['Region']

In [573]:
countries_df

Unnamed: 0,Rank,Country,Region,Population,% of world,Population (millions),Country / Region
0,1,China,Asia,1412600000,17.80%,1412.6,China / Asia
1,2,India,Asia,1386946912,17.50%,1386.95,India / Asia
2,3,United States,Americas,333073186,4.20%,333.07,United States / Americas
3,4,Indonesia[b],Asia,271350000,3.42%,271.35,Indonesia[b] / Asia
4,5,Pakistan,Asia,225200000,2.84%,225.2,Pakistan / Asia
5,6,Brazil,Americas,214231641,2.70%,214.23,Brazil / Americas
6,7,Nigeria,Africa,211401000,2.67%,211.4,Nigeria / Africa
7,8,Bangladesh,Asia,172062576,2.17%,172.06,Bangladesh / Asia
8,9,Russia[b],Europe,146171015,1.84%,146.17,Russia[b] / Europe
9,10,Mexico,Americas,126014024,1.59%,126.01,Mexico / Americas


In [574]:
countries_df['Country'] + ' / ' + countries_df['Region']

Unnamed: 0,0
0,China / Asia
1,India / Asia
2,United States / Americas
3,Indonesia[b] / Asia
4,Pakistan / Asia
5,Brazil / Americas
6,Nigeria / Africa
7,Bangladesh / Asia
8,Russia[b] / Europe
9,Mexico / Americas


In [575]:
countries_df.dtypes

Unnamed: 0,0
Rank,int64
Country,object
Region,object
Population,object
% of world,object
Population (millions),float64
Country / Region,object


In [576]:
 countries_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Rank                   10 non-null     int64  
 1   Country                10 non-null     object 
 2   Region                 10 non-null     object 
 3   Population             10 non-null     object 
 4   % of world             10 non-null     object 
 5   Population (millions)  10 non-null     float64
 6   Country / Region       10 non-null     object 
dtypes: float64(1), int64(1), object(5)
memory usage: 692.0+ bytes


In [577]:
countries_df['% of world']

Unnamed: 0,% of world
0,17.80%
1,17.50%
2,4.20%
3,3.42%
4,2.84%
5,2.70%
6,2.67%
7,2.17%
8,1.84%
9,1.59%


In [578]:
countries_df['% of world']

Unnamed: 0,% of world
0,17.80%
1,17.50%
2,4.20%
3,3.42%
4,2.84%
5,2.70%
6,2.67%
7,2.17%
8,1.84%
9,1.59%


In [579]:
'17.80%'[:-1]

'17.80'

In [580]:
'17.80%'[:len('17.80%')-1]

'17.80'

In [581]:
def remove_percent(x):
  return x[:-1]

In [582]:
remove_percent('17.80%')

'17.80'

In [583]:
countries_df['% of world'].apply(remove_percent)

Unnamed: 0,% of world
0,17.8
1,17.5
2,4.2
3,3.42
4,2.84
5,2.7
6,2.67
7,2.17
8,1.84
9,1.59


In [584]:
countries_df['% of world'].apply(lambda x: x[:-1])

Unnamed: 0,% of world
0,17.8
1,17.5
2,4.2
3,3.42
4,2.84
5,2.7
6,2.67
7,2.17
8,1.84
9,1.59


In [585]:
countries_df['% of world'] = countries_df['% of world'].apply(lambda x: x[:-1])

In [586]:
countries_df

Unnamed: 0,Rank,Country,Region,Population,% of world,Population (millions),Country / Region
0,1,China,Asia,1412600000,17.8,1412.6,China / Asia
1,2,India,Asia,1386946912,17.5,1386.95,India / Asia
2,3,United States,Americas,333073186,4.2,333.07,United States / Americas
3,4,Indonesia[b],Asia,271350000,3.42,271.35,Indonesia[b] / Asia
4,5,Pakistan,Asia,225200000,2.84,225.2,Pakistan / Asia
5,6,Brazil,Americas,214231641,2.7,214.23,Brazil / Americas
6,7,Nigeria,Africa,211401000,2.67,211.4,Nigeria / Africa
7,8,Bangladesh,Asia,172062576,2.17,172.06,Bangladesh / Asia
8,9,Russia[b],Europe,146171015,1.84,146.17,Russia[b] / Europe
9,10,Mexico,Americas,126014024,1.59,126.01,Mexico / Americas


In [587]:
countries_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Rank                   10 non-null     int64  
 1   Country                10 non-null     object 
 2   Region                 10 non-null     object 
 3   Population             10 non-null     object 
 4   % of world             10 non-null     object 
 5   Population (millions)  10 non-null     float64
 6   Country / Region       10 non-null     object 
dtypes: float64(1), int64(1), object(5)
memory usage: 692.0+ bytes


In [588]:
countries_df.dtypes

Unnamed: 0,0
Rank,int64
Country,object
Region,object
Population,object
% of world,object
Population (millions),float64
Country / Region,object


In [589]:
countries_df['% of world'].astype(float)

Unnamed: 0,% of world
0,17.8
1,17.5
2,4.2
3,3.42
4,2.84
5,2.7
6,2.67
7,2.17
8,1.84
9,1.59


In [590]:
countries_df['% of world'] = countries_df['% of world'].astype(float)

In [591]:
countries_df

Unnamed: 0,Rank,Country,Region,Population,% of world,Population (millions),Country / Region
0,1,China,Asia,1412600000,17.8,1412.6,China / Asia
1,2,India,Asia,1386946912,17.5,1386.95,India / Asia
2,3,United States,Americas,333073186,4.2,333.07,United States / Americas
3,4,Indonesia[b],Asia,271350000,3.42,271.35,Indonesia[b] / Asia
4,5,Pakistan,Asia,225200000,2.84,225.2,Pakistan / Asia
5,6,Brazil,Americas,214231641,2.7,214.23,Brazil / Americas
6,7,Nigeria,Africa,211401000,2.67,211.4,Nigeria / Africa
7,8,Bangladesh,Asia,172062576,2.17,172.06,Bangladesh / Asia
8,9,Russia[b],Europe,146171015,1.84,146.17,Russia[b] / Europe
9,10,Mexico,Americas,126014024,1.59,126.01,Mexico / Americas


In [592]:
countries_df.dtypes

Unnamed: 0,0
Rank,int64
Country,object
Region,object
Population,object
% of world,float64
Population (millions),float64
Country / Region,object


In [593]:
countries_df['Population'] = countries_df['Population'].str.replace(',','').astype(float)

In [594]:
countries_df['World Population'] = countries_df['Population']/(countries_df['% of world']/100)

In [595]:
countries_df

Unnamed: 0,Rank,Country,Region,Population,% of world,Population (millions),Country / Region,World Population
0,1,China,Asia,1412600000.0,17.8,1412.6,China / Asia,7935955000.0
1,2,India,Asia,1386947000.0,17.5,1386.95,India / Asia,7925411000.0
2,3,United States,Americas,333073200.0,4.2,333.07,United States / Americas,7930314000.0
3,4,Indonesia[b],Asia,271350000.0,3.42,271.35,Indonesia[b] / Asia,7934211000.0
4,5,Pakistan,Asia,225200000.0,2.84,225.2,Pakistan / Asia,7929577000.0
5,6,Brazil,Americas,214231600.0,2.7,214.23,Brazil / Americas,7934505000.0
6,7,Nigeria,Africa,211401000.0,2.67,211.4,Nigeria / Africa,7917640000.0
7,8,Bangladesh,Asia,172062600.0,2.17,172.06,Bangladesh / Asia,7929151000.0
8,9,Russia[b],Europe,146171000.0,1.84,146.17,Russia[b] / Europe,7944077000.0
9,10,Mexico,Americas,126014000.0,1.59,126.01,Mexico / Americas,7925410000.0


In [596]:
x = countries_df['Population']

In [597]:
y = countries_df['% of world']/100

In [598]:
countries_df['World Population'] = x/y

In [599]:
countries_df

Unnamed: 0,Rank,Country,Region,Population,% of world,Population (millions),Country / Region,World Population
0,1,China,Asia,1412600000.0,17.8,1412.6,China / Asia,7935955000.0
1,2,India,Asia,1386947000.0,17.5,1386.95,India / Asia,7925411000.0
2,3,United States,Americas,333073200.0,4.2,333.07,United States / Americas,7930314000.0
3,4,Indonesia[b],Asia,271350000.0,3.42,271.35,Indonesia[b] / Asia,7934211000.0
4,5,Pakistan,Asia,225200000.0,2.84,225.2,Pakistan / Asia,7929577000.0
5,6,Brazil,Americas,214231600.0,2.7,214.23,Brazil / Americas,7934505000.0
6,7,Nigeria,Africa,211401000.0,2.67,211.4,Nigeria / Africa,7917640000.0
7,8,Bangladesh,Asia,172062600.0,2.17,172.06,Bangladesh / Asia,7929151000.0
8,9,Russia[b],Europe,146171000.0,1.84,146.17,Russia[b] / Europe,7944077000.0
9,10,Mexico,Americas,126014000.0,1.59,126.01,Mexico / Americas,7925410000.0


## Pandas Notes – More Data Manipulation Techniques

### Load Data

```python
import pandas as pd
df = pd.read_csv('TfL_daily_cycle.csv')  # Load CSV
```

---

### Clean Up Unwanted Columns

```python
df.drop('Unnamed: 2', axis=1, inplace=True)
```

---

### Convert to Datetime

#### Method 1: Using `astype`

```python
df['Date'] = df['Date'].astype('datetime64')
```

#### Method 2: Using `pd.to_datetime` with format codes (more reliable)

```python
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
```

- `%d` = Day (2-digit)
- `%m` = Month (2-digit)
- `%Y` = Year (4-digit)

---

### Sort Values

```python
df.sort_values(by='Total Cycle Hire', ascending=False, inplace=True)
```

- Sorts values by column
- Can be ascending or descending

---

### View Top or Bottom Rows

```python
df.head()  # Top 5 rows
df.tail()  # Bottom 5 rows
```

---

### Extract Components from Dates

```python
from datetime import datetime as dt

# Extract month-year from Date
df['Month/Year'] = df['Date'].dt.strftime('%m-%y')  # e.g., 07-21
df['Month/Year'] = df['Date'].dt.strftime('%b-%y')  # e.g., Jul-21
```

---

### Transpose DataFrame

```python
df.T  # Transpose (rows become columns)
```

#### Set custom index before transpose

```python
df.set_index('Month/Year', inplace=True)
df.T
```

---

### Explode Column

- Turns list-type values in a column into separate rows

```python
df.explode('Column_A')
```

- Other columns are duplicated to match expanded rows

---

### Summary

- Use `.drop()`, `.sort_values()`, `.set_index()` for basic data manipulation
- Convert date strings with `pd.to_datetime()` and extract components using `strftime`
- Transpose using `.T`, useful after setting a meaningful index
- `.explode()` is useful to flatten list-type columns into multiple rows
```

In [600]:
import pandas as pd

In [601]:
tfl_df = pd.read_csv('/content/tfl-daily-cycle-hires.csv')

In [602]:
tfl_df.dtypes

Unnamed: 0,0
Day,object
Number of Bicycle Hires,float64
Unnamed: 2,float64


In [603]:
tfl_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4081 entries, 0 to 4080
Data columns (total 3 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Day                      4081 non-null   object 
 1   Number of Bicycle Hires  4081 non-null   float64
 2   Unnamed: 2               0 non-null      float64
dtypes: float64(2), object(1)
memory usage: 95.8+ KB


In [604]:
tfl_df

Unnamed: 0,Day,Number of Bicycle Hires,Unnamed: 2
0,30/07/2010,6897.0,
1,31/07/2010,5564.0,
2,01/08/2010,4303.0,
3,02/08/2010,6642.0,
4,03/08/2010,7966.0,
...,...,...,...
4076,26/09/2021,45120.0,
4077,27/09/2021,32167.0,
4078,28/09/2021,32539.0,
4079,29/09/2021,39889.0,


In [605]:
tfl_df.drop(columns='Unnamed: 2',inplace=True)

In [606]:
tfl_df

Unnamed: 0,Day,Number of Bicycle Hires
0,30/07/2010,6897.0
1,31/07/2010,5564.0
2,01/08/2010,4303.0
3,02/08/2010,6642.0
4,03/08/2010,7966.0
...,...,...
4076,26/09/2021,45120.0
4077,27/09/2021,32167.0
4078,28/09/2021,32539.0
4079,29/09/2021,39889.0


In [607]:
tfl_df.dtypes

Unnamed: 0,0
Day,object
Number of Bicycle Hires,float64


In [608]:
tfl_df['Day'] = tfl_df['Day'].astype('datetime64[ns]')

In [609]:
tfl_df

Unnamed: 0,Day,Number of Bicycle Hires
0,2010-07-30,6897.0
1,2010-07-31,5564.0
2,2010-01-08,4303.0
3,2010-02-08,6642.0
4,2010-03-08,7966.0
...,...,...
4076,2021-09-26,45120.0
4077,2021-09-27,32167.0
4078,2021-09-28,32539.0
4079,2021-09-29,39889.0


In [610]:
tfl_df.dtypes

Unnamed: 0,0
Day,datetime64[ns]
Number of Bicycle Hires,float64


In [612]:
pd.to_datetime(tfl_df['Day'],format='%d/%m/%Y')

Unnamed: 0,Day
0,2010-07-30
1,2010-07-31
2,2010-01-08
3,2010-02-08
4,2010-03-08
...,...
4076,2021-09-26
4077,2021-09-27
4078,2021-09-28
4079,2021-09-29


In [613]:
tfl_df

Unnamed: 0,Day,Number of Bicycle Hires
0,2010-07-30,6897.0
1,2010-07-31,5564.0
2,2010-01-08,4303.0
3,2010-02-08,6642.0
4,2010-03-08,7966.0
...,...,...
4076,2021-09-26,45120.0
4077,2021-09-27,32167.0
4078,2021-09-28,32539.0
4079,2021-09-29,39889.0


In [615]:
tfl_df.sort_values(by='Number of Bicycle Hires',ascending=False,inplace=True)

In [616]:
tfl_df

Unnamed: 0,Day,Number of Bicycle Hires
1805,2015-09-07,73094.0
3592,2020-05-30,70170.0
3587,2020-05-25,67034.0
3606,2020-06-13,65045.0
3613,2020-06-20,64041.0
...,...,...
151,2010-12-28,3763.0
905,2013-01-20,3728.0
555,2012-05-02,3531.0
141,2010-12-18,2805.0


In [618]:
tfl_df.head()

Unnamed: 0,Day,Number of Bicycle Hires
1805,2015-09-07,73094.0
3592,2020-05-30,70170.0
3587,2020-05-25,67034.0
3606,2020-06-13,65045.0
3613,2020-06-20,64041.0


In [619]:
tfl_df.tail()

Unnamed: 0,Day,Number of Bicycle Hires
151,2010-12-28,3763.0
905,2013-01-20,3728.0
555,2012-05-02,3531.0
141,2010-12-18,2805.0
142,2010-12-19,2764.0


In [620]:
import datetime as dt

In [621]:
tfl_df['Day'].dt.strftime('%m-%y')

Unnamed: 0,Day
1805,09-15
3592,05-20
3587,05-20
3606,06-20
3613,06-20
...,...
151,12-10
905,01-13
555,05-12
141,12-10


In [622]:
tfl_df['Month / Year'] = tfl_df['Day'].dt.strftime('%b-%y')

In [623]:
tfl_df

Unnamed: 0,Day,Number of Bicycle Hires,Month / Year
1805,2015-09-07,73094.0,Sep-15
3592,2020-05-30,70170.0,May-20
3587,2020-05-25,67034.0,May-20
3606,2020-06-13,65045.0,Jun-20
3613,2020-06-20,64041.0,Jun-20
...,...,...,...
151,2010-12-28,3763.0,Dec-10
905,2013-01-20,3728.0,Jan-13
555,2012-05-02,3531.0,May-12
141,2010-12-18,2805.0,Dec-10


In [624]:
tfl_df.T

Unnamed: 0,1805,3592,3587,3606,3613,1833,3593,3607,3963,3641,...,155,149,1251,2,150,151,905,555,141,142
Day,2015-09-07 00:00:00,2020-05-30 00:00:00,2020-05-25 00:00:00,2020-06-13 00:00:00,2020-06-20 00:00:00,2015-06-08 00:00:00,2020-05-31 00:00:00,2020-06-14 00:00:00,2021-05-06 00:00:00,2020-07-18 00:00:00,...,2011-01-01 00:00:00,2010-12-26 00:00:00,2014-01-01 00:00:00,2010-01-08 00:00:00,2010-12-27 00:00:00,2010-12-28 00:00:00,2013-01-20 00:00:00,2012-05-02 00:00:00,2010-12-18 00:00:00,2010-12-19 00:00:00
Number of Bicycle Hires,73094.0,70170.0,67034.0,65045.0,64041.0,63963.0,63116.0,57516.0,56900.0,56654.0,...,4555.0,4383.0,4327.0,4303.0,3971.0,3763.0,3728.0,3531.0,2805.0,2764.0
Month / Year,Sep-15,May-20,May-20,Jun-20,Jun-20,Jun-15,May-20,Jun-20,May-21,Jul-20,...,Jan-11,Dec-10,Jan-14,Jan-10,Dec-10,Dec-10,Jan-13,May-12,Dec-10,Dec-10


In [625]:
tfl_df.transpose()

Unnamed: 0,1805,3592,3587,3606,3613,1833,3593,3607,3963,3641,...,155,149,1251,2,150,151,905,555,141,142
Day,2015-09-07 00:00:00,2020-05-30 00:00:00,2020-05-25 00:00:00,2020-06-13 00:00:00,2020-06-20 00:00:00,2015-06-08 00:00:00,2020-05-31 00:00:00,2020-06-14 00:00:00,2021-05-06 00:00:00,2020-07-18 00:00:00,...,2011-01-01 00:00:00,2010-12-26 00:00:00,2014-01-01 00:00:00,2010-01-08 00:00:00,2010-12-27 00:00:00,2010-12-28 00:00:00,2013-01-20 00:00:00,2012-05-02 00:00:00,2010-12-18 00:00:00,2010-12-19 00:00:00
Number of Bicycle Hires,73094.0,70170.0,67034.0,65045.0,64041.0,63963.0,63116.0,57516.0,56900.0,56654.0,...,4555.0,4383.0,4327.0,4303.0,3971.0,3763.0,3728.0,3531.0,2805.0,2764.0
Month / Year,Sep-15,May-20,May-20,Jun-20,Jun-20,Jun-15,May-20,Jun-20,May-21,Jul-20,...,Jan-11,Dec-10,Jan-14,Jan-10,Dec-10,Dec-10,Jan-13,May-12,Dec-10,Dec-10


In [626]:
tfl_df

Unnamed: 0,Day,Number of Bicycle Hires,Month / Year
1805,2015-09-07,73094.0,Sep-15
3592,2020-05-30,70170.0,May-20
3587,2020-05-25,67034.0,May-20
3606,2020-06-13,65045.0,Jun-20
3613,2020-06-20,64041.0,Jun-20
...,...,...,...
151,2010-12-28,3763.0,Dec-10
905,2013-01-20,3728.0,Jan-13
555,2012-05-02,3531.0,May-12
141,2010-12-18,2805.0,Dec-10


In [629]:
tfl_df.set_index(keys='Month / Year', inplace=True)

In [631]:
tfl_df.transpose()

Month / Year,Sep-15,May-20,May-20.1,Jun-20,Jun-20.1,Jun-15,May-20.2,Jun-20.2,May-21,Jul-20,...,Jan-11,Dec-10,Jan-14,Jan-10,Dec-10.1,Dec-10.2,Jan-13,May-12,Dec-10.3,Dec-10.4
Day,2015-09-07 00:00:00,2020-05-30 00:00:00,2020-05-25 00:00:00,2020-06-13 00:00:00,2020-06-20 00:00:00,2015-06-08 00:00:00,2020-05-31 00:00:00,2020-06-14 00:00:00,2021-05-06 00:00:00,2020-07-18 00:00:00,...,2011-01-01 00:00:00,2010-12-26 00:00:00,2014-01-01 00:00:00,2010-01-08 00:00:00,2010-12-27 00:00:00,2010-12-28 00:00:00,2013-01-20 00:00:00,2012-05-02 00:00:00,2010-12-18 00:00:00,2010-12-19 00:00:00
Number of Bicycle Hires,73094.0,70170.0,67034.0,65045.0,64041.0,63963.0,63116.0,57516.0,56900.0,56654.0,...,4555.0,4383.0,4327.0,4303.0,3971.0,3763.0,3728.0,3531.0,2805.0,2764.0


In [634]:
column_index = ['A','B']
data_list = [[[2,5], 'nested list'], [2, 'not a nested list'], [[3,4],'nested list 2']]

In [635]:
df=pd.DataFrame(data=data_list,columns=column_index)

In [636]:
df

Unnamed: 0,A,B
0,"[2, 5]",nested list
1,2,not a nested list
2,"[3, 4]",nested list 2


In [637]:
df.explode(column='A')

Unnamed: 0,A,B
0,2,nested list
0,5,nested list
1,2,not a nested list
2,3,nested list 2
2,4,nested list 2


In [628]:
tfl_df.pivot(index='Day',columns='Month / Year',values='Number of Bicycle Hires')

Month / Year,Apr-10,Apr-11,Apr-12,Apr-13,Apr-14,Apr-15,Apr-16,Apr-17,Apr-18,Apr-19,...,Sep-12,Sep-13,Sep-14,Sep-15,Sep-16,Sep-17,Sep-18,Sep-19,Sep-20,Sep-21
Day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010-01-08,,,,,,,,,,,...,,,,,,,,,,
2010-01-09,,,,,,,,,,,...,,,,,,,,,,
2010-01-10,,,,,,,,,,,...,,,,,,,,,,
2010-01-11,,,,,,,,,,,...,,,,,,,,,,
2010-01-12,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-12-05,,,,,,,,,,,...,,,,,,,,,,
2021-12-06,,,,,,,,,,,...,,,,,,,,,,
2021-12-07,,,,,,,,,,,...,,,,,,,,,,
2021-12-08,,,,,,,,,,,...,,,,,,,,,,
