# How to Import Data in Python

## Learning Objectives
One of the reasons why Python is such a popular programming language for machine learning is because it supports some very powerful and easy to use packages which are purpose-built for data analysis. One of these packages is the **pandas** package. In this exercise, you will get a brief introduction to the pandas package and how to import data using functions provided by the pandas package. By the end of this tutorial, you will have learned:

+ what a pandas Series is
+ what a pandas DataFrame is
+ how to import data from a CSV file
+ how to import data from an Excel file

## The pandas Package

In [13]:
%pip install pandas
import pandas as pd

Collecting pandas
  Downloading pandas-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.2/91.2 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting numpy>=1.26.0 (from pandas)
  Downloading numpy-2.3.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.1/62.1 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.0/12.0 MB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading numpy-2.3.1-cp312-cp312-manylinux_2_2

## The pandas Series

In [15]:
members = ["Brazil", "Russia", "India", "China", "South Africa"]
brics1 = pd.Series(members) 
brics1

0          Brazil
1          Russia
2           India
3           China
4    South Africa
dtype: object

In [16]:
type(brics1)

pandas.core.series.Series

## The pandas DataFrame

In [18]:
members = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
        "capital": ["Brasilia", "Moscow", "New Delhi", "Beijing", "Pretoria"],
        "gdp": [2750, 1658, 3202, 15270, 370],
        "literacy":[.944, .997, .721, .964, .943],
        "expectancy": [76.8, 72.7, 68.8, 76.4, 63.6],
        "population": [210.87, 143.96, 1367.09, 1415.05, 57.4]}
brics2 = pd.DataFrame(members)
brics2

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


In [19]:
type(brics2)

pandas.core.frame.DataFrame

In [20]:
members = [["Brazil", "Brasilia", 2750, 0.944, 76.8, 210.87],
                     ["Russia", "Moscow", 1658, 0.997, 72.7, 143.96],
                     ["India", "New Delhi", 3202, 0.721, 68.8, 1367.09],
                     ["China", "Beijing", 15270, 0.964, 76.4, 1415.05],
                     ["South Africa", "Pretoria", 370, 0.943, 63.6, 57.4]]
labels = ["country", "capital", "gdp", "literacy", "expectancy", "population"]
brics3 = pd.DataFrame(members, columns=labels)
brics3  

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


## How to import data from a CSV file

In [21]:
brics4 = pd.read_csv("brics.csv")
brics4

Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


## How to import data from an Excel file

In [24]:
%pip install openpyxl
brics5 = pd.read_excel("brics.xlsx")
brics5

Collecting openpyxl
  Downloading openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-2.0.0-py3-none-any.whl.metadata (2.7 kB)
Downloading openpyxl-3.1.5-py2.py3-none-any.whl (250 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.9/250.9 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading et_xmlfile-2.0.0-py3-none-any.whl (18 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-2.0.0 openpyxl-3.1.5
Note: you may need to restart the kernel to use updated packages.


Unnamed: 0,country,capital,gdp,literacy,expectancy,population
0,Brazil,Brasilia,2750,0.944,76.8,210.87
1,Russia,Moscow,1658,0.997,72.7,143.96
2,India,New Delhi,3202,0.721,68.8,1367.09
3,China,Beijing,15270,0.964,76.4,1415.05
4,South Africa,Pretoria,370,0.943,63.6,57.4


In [27]:
brics6 = pd.read_excel("brics.xlsx", sheet_name= "Summits" )
brics6

Unnamed: 0,summit,date,host,leader,location
0,1st,"June 16th, 2009",Russia,Dmitry Medvedev,Yekaterinburg (Sevastianov's House)
1,2nd,"April 15th, 2010",Brazil,Luiz Inácio Lula da Silva,Brasília (Itamaraty Palace)
2,3rd,"April 14th, 2011",China,Hu Jintao,Sanya (Sheraton Sanya Resort)
3,4th,"March 29th, 2012",India,Manmohan Singh,New Delhi (Taj Mahal Hotel)
4,5th,"March 26th – 27th, 2013",South Africa,Jacob Zuma,Durban (Durban ICC)
5,6th,"July 14th – 17th, 2014",Brazil,Dilma Rousseff,Fortaleza (Centro de Eventos do Ceará)
6,7th,"July 8th – 9th, 2015",Russia,Vladimir Putin,Ufa (Congress Hall)
7,8th,"October 15th – 16th, 2016",India,Narendra Modi,Benaulim (Taj Exotica)
8,9th,"September 3th – 5th, 2017",China,Xi Jinping,Xiamen (Xiamen International Conference Center)
9,10th,"July 25th – 27th, 2018",South Africa,Cyril Ramaphosa,Johannesburg (Sandton Convention Centre)
