### Pandas Tutorial

#### What is pandas?
Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

Most data will be processed with the help of `DataFrames`.

List of milestones:

- Key properties of a `Dataframe`
- Read CSV into `Dataframe`
- Read Database into `Dataframe`

#### Datasets Required
1.

All datasets are downloaded from data.gov.sg

#### Step 1. Imports

In [3]:
#Import pandas and numpy

import pandas as pd
import numpy as np

### Instantiating a Dataframe

You can create a dataframe data from a CSV file.

Use pd.read_csv() to read CSV data into a Dataframe.

Documentation can be seen [here](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv).

CSV Source can be accessed [here](https://data.gov.sg/dataset/annual-motor-vehicle-population-by-vehicle-type?view_id=6aca1157-ea79-4e39-9e58-3e5313a9a715&resource_id=dec53407-9f97-47b8-ba89-b2070569a09e)

In [4]:
#To read from CSV file

dataframe = pd.read_csv('annual-motor-vehicle-population-by-vehicle-type.csv')


The commands below prints out the <b>basic details</b> of the dataframe.

- dataframe.dtypes (Column types)
- dataframe.columns (available columns)
- dataframe.shape (number of rows and columns)


In [6]:
#to check column types
print(dataframe.dtypes)

print('-----------------')

#to check columns with columns
print(dataframe.columns)

print('-----------------')

#to check number of rows and columns
print(dataframe.shape)



year         int64
category    object
type        object
number       int64
dtype: object
-----------------
Index(['year', 'category', 'type', 'number'], dtype='object')
-----------------
(260, 4)


The commands below prints out <b>data inside</b> the dataframe.

- dataframe.head() (Prints out the first 5 data by default)

Try dataframe.head(*integer*)

- dataframe.sample() (Prints out a random sampled row in the dataframe)

In [7]:
#try .head()

dataframe.head()

Unnamed: 0,year,category,type,number
0,2005,Cars & Station-wagons,Private cars,401638
1,2006,Cars & Station-wagons,Private cars,421904
2,2007,Cars & Station-wagons,Private cars,451745
3,2008,Cars & Station-wagons,Private cars,476634
4,2009,Cars & Station-wagons,Private cars,497116


In [8]:
#try .sample()

dataframe.sample()

Unnamed: 0,year,category,type,number
104,2009,Goods & Other Vehicles,Very Heavy Goods Vehicles (VHGVs),12962


### Retrieving data from a dataframe for data processing

#### Retrieving columns:
A column retrieved and stored as a variable is of type <b> series</b>.

Extract the column with dataframe["Column_Name"]

In [10]:
#Extract the "year" column and store it into series

series = dataframe["year"]
type(series)

pandas.core.series.Series

Similar to a dataframe, .sample() and .head() can be used to extract information

In [13]:
#Try .sample() and .head() on series

print(series.sample())

print ("-------------")

print(series.head())

256    2017
Name: year, dtype: int64
-------------
0    2005
1    2006
2    2007
3    2008
4    2009
Name: year, dtype: int64


##### Retrieve a specific record from the dataframe

Retrieve a specific record from the dataframe with dataframe.iloc[]
.iloc uses zero indexing

In [15]:
#Get the details of row index 2. Check if it has a year 2007. Name your variable row_2

row_2 = dataframe.iloc[2]
print(row_2)

year                         2007
category    Cars & Station-wagons
type                 Private cars
number                     451745
Name: 2, dtype: object


##### Complete the following code to get the number, category and type of vehicle based on the series in index 2, selected in the previous box.

Use the column indexing method to retrieve data in a specific column. Eg. row_2['Column_name']

In [20]:
veh_num = 0
veh_type = ''
veh_cat = ''
veh_year = 0
###############

veh_cat = row_2['category']
veh_type = row_2['type']
veh_num = row_2['number']
veh_year = row_2['year']

print('In ' + str(veh_year) + ', there are ' + str(veh_num) + ' ' + str(veh_type) + ' in Singapore')
print(str(veh_type) + ' belong to the ' + str(veh_cat) + ' category.')

In 2007, there are 451745 Private cars in Singapore
Private cars belong to the Cars & Station-wagons category.


### Filtering records

#### You can perform filters to the dataframe to get a set of data that you want.

Use the following code structure to do so.

```python
dataframe[dataframe['column'] <conditional operator> <value>]
```

----------------------------------

Before filtering, it is `good practice` to perform a copy of the dataframe first, so modifications done while filtering do not affect the original dataframe.

Do so with dataframe.copy()

##### After which, identify the records whose number of vehicles are above 100_000

Insert the condition by creating a condition variable:
- condition = dataframe2['Column_name'] > Integer

Apply the condition:
- dataframe2[condition]

Another method to filter will be to define the condition directly in the dataframe.
- dataframe2[dataframe2['Column_name'] > Integer]

In [29]:
#Before filtering, it is good practice to perform a copy of the dataframe first, so modifications done
#while filtering do not affect the original dataframe.

dataframe2 = dataframe.copy()

condition = dataframe2['number'] > 100_000

print(dataframe2[condition])

#Uncomment this to view result
# print(dataframe2[dataframe2['number'] > 100_000])

     year                  category                      type  number
0    2005     Cars & Station-wagons              Private cars  401638
1    2006     Cars & Station-wagons              Private cars  421904
2    2007     Cars & Station-wagons              Private cars  451745
3    2008     Cars & Station-wagons              Private cars  476634
4    2009     Cars & Station-wagons              Private cars  497116
5    2010     Cars & Station-wagons              Private cars  511125
6    2011     Cars & Station-wagons              Private cars  520614
7    2012     Cars & Station-wagons              Private cars  535233
8    2013     Cars & Station-wagons              Private cars  540063
9    2014     Cars & Station-wagons              Private cars  536882
60   2005               Motorcycles               Motorcycles  138588
61   2006               Motorcycles               Motorcycles  141881
62   2007               Motorcycles               Motorcycles  143482
63   2008           

##### Besides integers, you can use string comparators.

dataframe2['type'] == 'Motorcycles and Scooters'

In [30]:
print(dataframe2[dataframe2['type'] == 'Motorcycles and Scooters'])

     year                  category                      type  number
246  2017  Motorcycles and Scooters  Motorcycles and Scooters  141304
