### Pandas 101
In this notebook we will learn about Pandas, the Python data analysis tools, to manipulate and analyse data.

In [13]:
import pandas as pd

In [14]:
import numpy as np

### 1. Reading, Writing, and Creating data
Pandas let you use your data from multiple formats like CSV, EXCEL, JSON etc.

You can use the data file on your local system or from an external URL.

#### 1.1 Create data
You can easily convert your Python lists into a DataFrame object.

DataFrame is the most commonly used data-structure of Pandas, it's a 2-dimensional table like structure that can hold columns of multiple data-types.

In [15]:
# You can easily convert your Python lists into a DataFrame.
subjects = ['Mathematics', 'English', 'History', 'Science', 'Arts']
marks = [67, 60, 36, 61, 58]

Using the Python's **zip** function, we can merge these two **list** sequences into one.

In [16]:
marks_dataset = zip(subjects, marks)
marks_dataset

[('Mathematics', 67),
 ('English', 60),
 ('History', 36),
 ('Science', 61),
 ('Arts', 58)]

In [17]:
# Convert the new list to a DataFrame
marks_df = pd.DataFrame(marks_dataset, columns=['Subjects', 'Marks'])
marks_df

Unnamed: 0,Subjects,Marks
0,Mathematics,67
1,English,60
2,History,36
3,Science,61
4,Arts,58


The **columns** argument list represents the labels of the respective columns.

Add a new column 'Result' using **numpy.where**, set it as 'Pass' if **marks>=40** else 'Fail'.

In [18]:
marks_df['Result'] = np.where(marks_df['Marks']>=40, 'Pass', 'Fail')
marks_df

Unnamed: 0,Subjects,Marks,Result
0,Mathematics,67,Pass
1,English,60,Pass
2,History,36,Fail
3,Science,61,Pass
4,Arts,58,Pass


To delete a column (say 'Result'), we can use **`marks_df.pop('Result')`**.

#### 1.2 Write data
We can write the **DataFrame** object to different file types.

In [19]:
# save the marks dataframe to a csv(comma-separated values) file in your directory
marks_df.to_csv('marks.csv', index=False)

The argument **index=False** is to prevent writing the index for each row (0...4) in file.

#### 1.3 Read data
You can read data into Pandas from different types of files.

**Here, we will be using the 'YouTube Channel Dataset'.** [https://gist.github.com/pravj/9ae9e67d10668c60545e2b858753415c]

It represents the YouTube channel customer reach (views and comments) for two channel named '*WorldNews*' and '*WorldWeather*'.

In [20]:
# reading csv dataset from an external URL
channels_df = pd.read_csv('http://bit.ly/pandas-101-dataset-csv')

### 2. Selecting and Filtering dataframes
In this section, we will learn about selecting and filtering out parts of the dataframes, satisfying specific conditions.

In [21]:
# shows statistics about the dataset
channels_df.describe()

Unnamed: 0,Views,Comments
count,362.0,362.0
mean,284.093923,17.524862
std,144.301074,11.144266
min,52.0,3.0
25%,170.25,8.0
50%,291.0,15.0
75%,400.0,24.0
max,538.0,53.0


Select top 5 row for the dataframe. (**`tail`** function will return last 5 rows)

In [22]:
channels_df.head()

Unnamed: 0,Channel,Date,Views,Comments
0,WorldNews,2015-01-01,102,20
1,WorldWeather,2015-01-01,74,14
2,WorldNews,2015-01-02,100,32
3,WorldWeather,2015-01-02,318,24
4,WorldNews,2015-01-03,278,28


Select rows only for the 'WorldNews' channel.

In [23]:
worldnews_df = channels_df[channels_df['Channel'] == 'WorldNews']

Select some sample rows from the new dataframe.

In [24]:
worldnews_df.sample(3)

Unnamed: 0,Channel,Date,Views,Comments
234,WorldNews,2015-04-28,365,9
356,WorldNews,2015-06-28,175,10
82,WorldNews,2015-02-11,491,16


Filter out the rows for 'WorldNews' channel having less than 100 views.

In [25]:
worldnews_less_views_df = worldnews_df[worldnews_df['Views'] < 100]

Count the days when 'WorldNews' has received less than 100 views.

In [26]:
worldnews_less_views_df.size

108

### 3. Grouping, Aggregating, and Pivoting data