# How to Import Data in Python

## Learning Objectives
One of the reasons why Python is such a popular programming language for machine learning is because it supports some very powerful and easy to use packages which are purpose-built for data analysis. One of these packages is the **pandas** package. In this exercise, you will get a brief introduction to the pandas package and how to import data using functions provided by the pandas package. By the end of this tutorial, you will have learned:

+ what a pandas Series is
+ what a pandas DataFrame is
+ how to import data from a CSV file
+ how to import data from an Excel file

## The pandas Package

In [2]:
import pandas as pd

## The pandas Series

In [2]:
members = ["Brazil", "Russia", "India", "China", "South Africa"]

In [6]:
series = pd.Series(members)
series

0          Brazil
1          Russia
2           India
3           China
4    South Africa
dtype: object

## The pandas DataFrame

In [7]:
members = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
        "capital": ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"],
        "gdp": [2750, 1658, 3202, 15270, 370],
        "literacy":[.944, .997, .721, .964, .943],
        "expectancy": [76.8, 72.7, 68.8, 76.4, 63.6],
        "population": [210.87, 143.96, 1367.09, 1415.05, 57.4]}

In [8]:
df = pd.DataFrame(members)

df.head()

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


In [11]:
members = [["Brazil", "Brasilia", 2750, 0.944, 76.8, 210.87],
                     ["Russia", "Moscow", 1658, 0.997, 72.7, 143.96],
                     ["India", "New Delhi", 3202, 0.721, 68.8, 1367.09],
                     ["China", "Beijing", 15270, 0.964, 76.4, 1415.05],
                     ["South Africa", "Pretoria", 370, 0.943, 63.6, 57.4]]


labels = ["country", "capital", "gdp", "literacy", "expectancy", "population"]


df = pd.DataFrame(members, columns=labels)
df.head()

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


## How to import data from a CSV file

In [12]:
df = pd.read_csv('./data/brics.csv')
df.head()

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


## How to import data from an Excel file

In [17]:
df = pd.read_excel('./data/brics.xlsx', engine='openpyxl')
df.head()

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


In [18]:
# reading multiple sheets Excel and want to read a specific sheet
df = pd.read_excel("./data/brics.xlsx", engine='openpyxl', sheet_name='Summits')
df.head()

Unnamed: 0,summit,date,host,leader,location
0,1st,"June 16th, 2009",Russia,Dmitry Medvedev,Yekaterinburg (Sevastianov's House)
1,2nd,"April 15th, 2010",Brazil,Luiz Inácio Lula da Silva,Brasília (Itamaraty Palace)
2,3rd,"April 14th, 2011",China,Hu Jintao,Sanya (Sheraton Sanya Resort)
3,4th,"March 29th, 2012",India,Manmohan Singh,New Delhi (Taj Mahal Hotel)
4,5th,"March 26th – 27th, 2013",South Africa,Jacob Zuma,Durban (Durban ICC)
