This tutorial has been taken from **[Data School](https://www.youtube.com/channel/UCnVzApLJE2ljPZSeQylSEyg)**.<br>
The link for the tutorial video can be found [here](https://youtu.be/B-r9VuK80dk).

In [1]:
import pandas as pd

## How to read only some selected columns from a csv file?

In [4]:
ufo = pd.read_csv('http://bit.ly/uforeports')

In [5]:
ufo.columns

Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time'], dtype='object')

In [6]:
ufo = pd.read_csv('http://bit.ly/uforeports', usecols = [0, 4])

In [7]:
ufo.columns

Index(['City', 'Time'], dtype='object')

In [8]:
ufo = pd.read_csv('http://bit.ly/uforeports', usecols = ['State', 'City'])

In [9]:
ufo.columns

Index(['City', 'State'], dtype='object')

## Fastest method to read from CSV file?
Question is unclear. <br>
However, just to get an idea of a dataframe, we may only import first few rows. This load is usually faster as lesser data needs to be loaded.

In [12]:
ufo = pd.read_csv('http://bit.ly/uforeports', nrows = 3)

In [13]:
ufo

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00


## How to iterate over DataFrames and Series and select individual entries?

In [14]:
# for iterating over Series
for c in ufo.City:
    print(c)

Ithaca
Willingboro
Holyoke


In [16]:
# for iterating over a DataFrame
for index, row in ufo.iterrows():
    print(index, row.City, row.State)

0 Ithaca NY
1 Willingboro NJ
2 Holyoke CO


## What is the best way to drop non-numeric columns from a DataFrame?

In [17]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')

In [18]:
drinks.dtypes

country                          object
beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

In [19]:
import numpy as np

In [20]:
drinks.select_dtypes(include = [np.number]).dtypes

beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
dtype: object

## Why do we use brackets with include in describe? Why not directly use some string?

In [21]:
drinks.describe()

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
count,193.0,193.0,193.0,193.0
mean,106.160622,80.994819,49.450777,4.717098
std,101.143103,88.284312,79.697598,3.773298
min,0.0,0.0,0.0,0.0
25%,20.0,4.0,1.0,1.3
50%,76.0,56.0,8.0,4.2
75%,188.0,128.0,59.0,7.2
max,376.0,438.0,370.0,14.4


In [22]:
drinks.describe(include = 'all')

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
count,193,193.0,193.0,193.0,193.0,193
unique,193,,,,,6
top,Kyrgyzstan,,,,,Africa
freq,1,,,,,53
mean,,106.160622,80.994819,49.450777,4.717098,
std,,101.143103,88.284312,79.697598,3.773298,
min,,0.0,0.0,0.0,0.0,
25%,,20.0,4.0,1.0,1.3,
50%,,76.0,56.0,8.0,4.2,
75%,,188.0,128.0,59.0,7.2,


In [28]:
# This is just how it works, all the datatypes, 
# if more than one need to be specified, have to be in the list
drinks.describe(include = ['object', 'int64'])

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,continent
count,193,193.0,193.0,193.0,193
unique,193,,,,6
top,Kyrgyzstan,,,,Africa
freq,1,,,,53
mean,,106.160622,80.994819,49.450777,
std,,101.143103,88.284312,79.697598,
min,,0.0,0.0,0.0,
25%,,20.0,4.0,1.0,
50%,,76.0,56.0,8.0,
75%,,188.0,128.0,59.0,


In [29]:
# strings will work for single datatype inclusion
drinks.describe(include = 'object')

Unnamed: 0,country,continent
count,193,193
unique,193,6
top,Kyrgyzstan,Africa
freq,1,53


In [31]:
# However, here also list will work,
# so list would be a more general way to do
# the same work
drinks.describe(include = ['object'])

Unnamed: 0,country,continent
count,193,193
unique,193,6
top,Kyrgyzstan,Africa
freq,1,53
