# Programming with Python

## Session 4

## Libraries & Modules

One of the great things about Python is the free availability of a _huge_ number of libraries (also sometimes called packages) that can be imported into your code and (re)used. 

Modules contain functions for use by other programs and are developed with the aim of solving some particular problem or providing particular, often domain-specific, capabilities. A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module (so don’t worry if you mix them). 

In order to import a library, it must be installed on your system.  

A large number of libraries are already available for import in the standard distribution of Python: this is known as the standard library. If you installed the Anaconda distribution of Python, you have even more libraries already installed - mostly aimed at data science.

Importing a library is easy:

- Import (keyword) + library name, for example: 
    - `import os    # contains functions for interacting with the operating system`
    - `import sys   # contains utilities to process command line arguments`

More at: https://pypi.python.org/pypi

Hint: Try to import libraries/modules (a smaller part of a larger library) _whenever you need to do anything specialised_. Don't take time rewriting something that someone else has already written for you!

In [None]:
import os

# Get current directory
cwd = os.getcwd()
print(cwd)

In [None]:
# Make new directory
os.mkdir('test_dir')

In [None]:
help(os) # manual page created from the module's docstrings

### Using loops to iterate through files in a directory

In [None]:
# define a function that lists all the files in the folder called data

import os

def read_each_filename(pathname):
    for files in os.listdir(pathname):
        print(files)

datapath = 'data' # name of path with multiple files
read_each_filename(datapath)

In [None]:
# define a function that reads and prints each line of each file in the folder called data


### Examples of importing basic modules.

#### Questions
- How can I read tabular data?

#### Objectives
- Import the Pandas library.
- Use Pandas to load a simple CSV data set.
- Get some basic information about a Pandas DataFrame.

`Pandas` cheat sheet: https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf

In [None]:
import pandas

In [None]:
# Use Oceania data here

df = pandas.read_csv("data/gapminder_gdp_asia.csv") #put data from a file in the dataframe

# examples of working with a Pandas dataframe


#### Aside: Namespaces
Python uses namespaces a lot, to ensure appropriate separation of functions, attributes, methods etc between modules and objects. When you import an entire module, the functions and classes available within that module are loaded in under the modules namespace - `pandas` in the example above.  
It is possible to customise the namespace at the point of import, allowing you to e.g. shorten/abbreviate the module name to save some typing:

In [None]:
import pandas as pd

Also, as in the examples above, if you need only a single function from a module/library, you can import that directly into your main namespace (where you don't need to specify the module before the name of the function):

In [None]:
from pandas import read_csv

#### Conventions
- You should perform all of your imports at the beginning of your program. This ensures that
  - users can easily identify the dependencies of a program, and 
  - that any lacking dependencies (causing fatal `ImportError` exceptions) are caught early in execution
- the shortening of `numpy` to `np` and `pandas` to `pd` are very common, and there are others too - watch out for this when e.g. reading docs and guides/SO answers online.

## Execise 4 - Reading other data

Read the data in `gapminder_gdp_americas.csv` (which should be in the same directory as `gapminder_gdp_oceania.csv`) into a variable called `americas` and display its summary statistics.

## Exercise 5 - Inspecting data

After reading the data for the Americas (exercise above), use `help(americas.head)` and `help(americas.tail)` to find out what `DataFrame.head` and `DataFrame.tail` do.

* What method call will display the first three rows of this data?
* What method call will display the last three columns of this data? (Hint: you may need to change your view of the data.)

## Exercise 6 - Reading files in other directories

The data for your current project is stored in a file called `microbes.csv`, which is located in a folder called `field_data`. You are doing analysis in a notebook called `analysis.ipynb` in a sibling folder called `thesis`:

```
your_home_directory
+-- field_data/
|   +-- microbes.csv
+-- thesis/
    +-- analysis.ipynb
```

What value(s) should you pass to `read_csv` to read `microbes.csv` in `analysis.ipynb`?

## Exercise 7 - Writing data

As well as the `read_csv` function for reading data from a file, Pandas provides a `to_csv` function to write dataframes to files. Applying what you’ve learned about reading from files, write one of your dataframes to a file called `processed.csv`. You can use help to get information on how to use `to_csv`.

#### Aside: Your Own Modules
Whenever you write some python code and save it as a script, with the `.py` file extension, you are creating your own module. If you define functions within that module, you can load them into other scripts and sessions.

### Some Interesting Module Libraries to Investigate
- os
- sys
- shutil
- random
- collections
- math
- argparse
- time
- datetime
- numpy
- scipy
- matplotlib
- pandas
- scikit-learn
- requests
- biopython
- openpyxl