
# Hands-on Lab: Loading data with Pandas


## Objectives


*   Use Pandas to access and view data


<h2>Table of Contents</h2>
<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li><a href="#Introduction-of-Pandas">Introduction of <code>Pandas</code></a></li>
        <li><a href="#Viewing-Data-and-Accessing-Data">Viewing Data and Accessing Data</a></li>
    </ul>

</div>

<hr>


## <code>Pandas</code>


In [42]:
# Import required library

import pandas as pd


In [96]:
# Read data from CSV file
album = 'albumlist1.csv'
df1 = pd.read_csv(album)

In [44]:
# Print first five rows of the dataframe

df1.head()

Unnamed: 0,Artist,Album,Released,Length,Genre,Music Recording Sales (millions),Claimed Sales (millions),Released.1,Soundtrack,Rating
0,Michael Jackson,Thriller,1982,0:42:19,"pop, rock, R&B",46.0,65,30-Nov-82,,10.0
1,AC/DC,Back in Black,1980,0:42:11,hard rock,26.1,50,25-Jul-80,,9.5
2,Pink Floyd,The Dark Side of the Moon,1973,0:42:49,progressive rock,24.2,45,01-Mar-73,,9.0
3,Whitney Houston,The Bodyguard,1992,0:57:44,"R&B, soul, pop",27.4,44,17-Nov-92,Y,8.5
4,Meat Loaf,Bat Out of Hell,1977,0:46:33,"hard rock, progressive rock",20.6,43,21-Oct-77,,8.0


We use the path of the excel file and the function <code>read_excel</code>. The result is a data frame as before:


In [97]:
# Read data from Excel File and print the first five rows

df2 = pd.read_excel('albumlist1.xlsx')
df2.head()

Unnamed: 0,Artist,Album,Released,Length,Genre,Music Recording Sales (millions),Claimed Sales (millions),Released.1,Soundtrack,Rating
0,Michael Jackson,Thriller,1982,00:42:19,"pop, rock, R&B",46.0,65,1982-11-30,,10.0
1,AC/DC,Back in Black,1980,0:42:11,hard rock,26.1,50,1980-07-25,,9.5
2,Pink Floyd,The Dark Side of the Moon,1973,0:42:49,progressive rock,24.2,45,1973-03-01,,9.0
3,Whitney Houston,The Bodyguard,1992,0:57:44,"R&B, soul, pop",27.4,44,1992-11-17,Y,8.5
4,Meat Loaf,Bat Out of Hell,1977,0:46:33,"hard rock, progressive rock",20.6,43,1977-10-21,,8.0


We can access the column <b>Length</b> and assign it a new dataframe <b>x</b>:


In [98]:
# Access to the column Length

x = df2[['Length']]
x

Unnamed: 0,Length
0,00:42:19
1,0:42:11
2,0:42:49
3,0:57:44
4,0:46:33
5,0:43:08
6,01:15:54
7,0:40:01


## Viewing Data and Accessing Data


You can also get a column as a series. You can think of a Pandas series as a 1-D dataframe. Just use one bracket:


In [99]:
# Get the column as a series

x = df2['Length']
x

0    00:42:19
1     0:42:11
2     0:42:49
3     0:57:44
4     0:46:33
5     0:43:08
6    01:15:54
7     0:40:01
Name: Length, dtype: object

You can also get a column as a dataframe. For example, we can assign the column <b>Artist</b>:


In [100]:
# Get the column as a dataframe

x = df2[['Artist']]
type(x)

pandas.core.frame.DataFrame

You can do the same thing for multiple columns; we just put the dataframe name, in this case, <code>df</code>, and the name of the multiple column headers enclosed in double brackets. The result is a new dataframe comprised of the specified columns:


In [101]:
# Access to multiple columns

y = df2[['Artist','Length','Genre']]
y

Unnamed: 0,Artist,Length,Genre
0,Michael Jackson,00:42:19,"pop, rock, R&B"
1,AC/DC,0:42:11,hard rock
2,Pink Floyd,0:42:49,progressive rock
3,Whitney Houston,0:57:44,"R&B, soul, pop"
4,Meat Loaf,0:46:33,"hard rock, progressive rock"
5,Eagles,0:43:08,"rock, soft rock, folk rock"
6,Bee Gees,01:15:54,disco
7,Fleetwood Mac,0:40:01,soft rock


The process is shown in the figure:


One way to access unique elements is the <code>iloc</code> method, where you can access the 1st row and the 1st column as follows:


In [102]:
# Access the value on the first row and the first column

df.iloc[0, 0]

'Michael Jackson'

You can access the 2nd row and the 1st column as follows:


In [103]:
# Access the value on the second row and the first column

df.iloc[1,0]

'AC/DC'

You can access the 1st row and the 3rd column as follows:


In [104]:
# Access the value on the first row and the third column

df.iloc[0,2]

1982

In [None]:
# Access the value on the second row and the third column
df.iloc[1,2]

You can access the column using the name as well, the following are the same as above:


In [105]:
# Access the column using the name

df.loc[1, 'Artist']

'AC/DC'

In [106]:
# Access the column using the name

df.loc[1, 'Artist']

'AC/DC'

In [107]:
# Access the column using the name

df.loc[0, 'Released']

1982

In [108]:
# Access the column using the name

df.loc[1, 'Released']

1980

You can perform slicing using both the index and the name of the column:


In [57]:
# Slicing the dataframe

df.iloc[0:2, 0:3]

Unnamed: 0,Artist,Album,Released
0,Michael Jackson,Thriller,1982
1,AC/DC,Back in Black,1980


In [109]:
# Slicing the dataframe using name

df.loc[0:2, 'Artist':'Released']

Unnamed: 0,Artist,Album,Released
0,Michael Jackson,Thriller,1982
1,AC/DC,Back in Black,1980
2,Pink Floyd,The Dark Side of the Moon,1973
