# Lab: Libraries

Welcome to the third part of IM939 lab 1. Here we are going to look at libraries and how they extend Python's core features.

## Modules

A module is a collection of code someone else has written (objects, methods, etc.). A library is a group of modules. In data science work you will use libraries such as `numpy` and `scikit-learn`. As we mentioned, Anaconda has many of these libraries built in and we do not need to install them separately.

Sometimes there are multiple ways to do something. Different libraries may have modules which have similiar functionality but slight differences.

For example, lets go through a comma seperated value file to make some data accessible in Python.

### CSV module

In [None]:
import csv

The import keyword loads in a module. Here it loads in the csv module from the inbuilt Python library (see the [Python docs on the csv module](https://docs.python.org/3/library/csv.html)).

Please put the Facebook.csv file in the same folder as this notebook for this bit.

The code for going through a csv file is a bit lengthy.

In [None]:
with open('data/Facebook.csv', mode = 'r', encoding = 'UTF-8') as csvfile:    # open up our csv file
    reader = csv.reader(csvfile)                                         # create a reader object which looks at our csv file
    for row in reader:                                                   # for each row the reader finds
        print(row)                                                       # print out the current row

Ok, each row looks like a list. But printing it out is pretty messy. We should dump this into a list of dictionaries (so we can pick out particular values).

In [None]:
csv_content = []                                   # create list to put our data into

with open('data/Facebook.csv', mode = 'r', encoding = 'UTF-8') as csvfile:  # open up csv file
    reader = csv.DictReader(csvfile)                                   # create a reader which will read in each row and turn it into a dictionary
    for row in reader:                                                 # for each row
        csv_content.append(row) 
        print(row)                                                     # put the created dictionary into our list

Look at the first entry in csv_content.

In [None]:
csv_content[0]

Dictionaries have keys. The keys method of the dictionary object will show them to us (though they are obvious from printing the first line aboe).

In [None]:
csv_content[0].keys()

In [None]:
csv_content[0]['post_message']

We have a list where each list element is a dictionary. So we need to index our list each time, hence the csv_content[0].

To go through our list we need to use a for loop. So, in order to get each post message.

In [None]:
for post in csv_content:
    print(post['post_message'])

If I want to do data science, where accessing data in a sensible way is key, then there must be a better way! There is.

### Pandas

The Pandas library is designed with data science in mind. You will examine it more in the coming weeks. Reading CSV files with pandas is very easy. 

In [None]:
import pandas as pd

The import x as y is good practice. In the below code, anything with a pd. prefix comes from pandas. This is particulary useful for preventing a module from overwriting inbuilt Python functionality.

In [None]:
df = pd.read_csv('data/Facebook.csv', encoding = 'UTF-8')

That was easy. We can even pick out specific rows.

In [None]:
df['post_message']

And even look at the dataset in a pretty way.

In [None]:
df

Huzzah!

A final point, pd.read_csv returns a pandas specific object with associated methods.

In [None]:
type(df)

In [None]:
df.dtypes

Where int64 are integers and object here refers to strings.