# Plotting and Programming in Python 👑 💻 🐍

contact: marii nyröp / m.nyrop@columbia.edu
<hr>

### Table of Contents

1. [Running and Quitting](#1.-Running-and-Quitting)
2. [Variables and Assignment](#2.-Variables-and-Assignment)
3. [Data Types and Type Conversion](#3.-Data-Types-and-Type-Conversion)
4. [Built-in Functions and Help](#4.-Built-in-Functions-and-Help)
5. [Libraries](#5.-Libraries)
6. [Reading Tabular Data into DataFrames](#6.-Reading-Tabular-Data-into-DataFrames)

## Intro

- We'll be learning the basics of Python with an emphasis on research (as opposed to, say, App building), though most of this will be widely applicable no matter what you do.

- We'll be using Python3 within the Jupyter interactive Notebook environment, which is preferred by researchers because (1) you can write prose alongside your code to contextualize it, (2) it's a portable format that shows the code as well as the resulting output, and (3) it encourages reproducibility.

- We've set your Jupyter notebooks up to work with Anaconda, which you installed along with Python. Anaconda is what manages all the special Python *libraries* that you can chose to use in your Python projects.

- The libraries we'll use today are called *pandas* and *matplotlib*, which are two of the most used libraries for manipulating and vizualizing data.

## Check-In

- Do you have `python-novice-gapminder-data.zip` downloaded and unpacked in your current directory?

In [None]:
!ls python-novice-gapminder-data.zip

- Can your notebook import the `pandas` library using anaconda?

In [None]:
import pandas

<hr>

## 1. Running and Quitting

### Key Points:

- Python programs are plain text files.
- Use the Jupyter Notebook for editing and running Python.
- The Notebook has Command and Edit modes.
- Use the keyboard and mouse to select and edit cells.
- The Notebook will turn Markdown into pretty-printed documentation.
- Markdown does most of what HTML does.

<hr>

## 2. Variables and Assignment

[Slide #1: Variables](https://slides.com/marii/cul-swc-python#/1)


Use variables to store values.

In [None]:
age = 42
first_name = 'Ahmed'

Use `print` to display values.

In [None]:
print(first_name, 'is', age, 'years old')

Variables must be created before they are used.

In [None]:
print(last_name)

Variables can be used in calculations.

In [None]:
age = age + 3
print('Age in three years:', age)

Use an index to get a single character from a string.

In [None]:
atom_name = 'helium'
print(atom_name[0])

Use a slice to get a substring.

In [None]:
atom_name = 'sodium'
print(atom_name[0:3])

Use the built-in function `len` to find the length of a string.

In [None]:
print(len('helium'))

<hr>

## 3. Data Types and Type Conversion

[Slide #2: Data Types](https://slides.com/marii/cul-swc-python#/2)

Use the built-in function `type` to find the type of a value.

In [None]:
print(type(52))

In [None]:
fitness = 'average'
print(type(fitness))

Types control what operations (or methods) can  be performed on a given value.

In [None]:
print(5 - 3)

In [None]:
print('hello' - 'h')

You can use the “+” and “\*” operators on strings.

In [None]:
full_name = 'Ahmed' + ' ' + 'Walsh'
print(full_name)

In [None]:
separator = '=' * 10
print(separator)

Strings have a length (but numbers don’t).

In [None]:
print(len(full_name))

In [None]:
print(len(52))

You must convert numbers to strings or vice versa when operating on them.

In [None]:
print(1 + '2')

In [None]:
print(1 + int('2'))
print(str(1) + '2')

You can mix integers and floats freely in operations. (This is only in Python 3, so watch out!)

In [None]:
print('half is', 1 / 2.0)
print('three squared is', 3.0 ** 2)

Variables only change value when something is assigned to them.

In [None]:
first = 1
second = 5 * first
first = 2
print('first is', first, 'and second is', second)

<hr>

## 4. Built-in Functions and Help 

Use comments to add documentation to programs.

[Slide #3 Functions + Syntax]()

In [None]:
# This sentence isn't executed by Python.
adjustment = 0.5   # Neither is this - anything after '#' is ignored.

A function may take zero or more arguments.

In [None]:
print('before')
print()
print('after')

Commonly-used built-in functions include `max`, `min`, and `round`.

In [None]:
print(max(1, 2, 3))
print(min('a', 'A', '0'))

Functions may only work for certain (combinations of) arguments.

In [None]:
print(max(1, 'a'))

Functions may have default values for some arguments.

In [None]:
round(3.712)

In [None]:
round(3.712, 1)

Use the built-in function `help` to get help for a function.

In [None]:
help(round)

Python reports a syntax error when it can’t understand the source of a program.

In [None]:
# Forgot to close the quote marks around the string.
name = 'Feng

In [None]:
# An extra '=' in the assignment.
age = = 52

In [None]:
print("hello world"

Python reports a runtime error when something goes wrong while a program is executing.

In [None]:
age = 53
remaining = 100 - aege # mis-spelled 'age'

The Jupyter Notebook has two ways to get help.


- Place the cursor inside the parenthesis of the function, hold down `shift`, and press `tab`.
- Or type a function name with a question mark after it.

In [None]:
round()

Every function returns something.

In [None]:
result = print('example')
print('result of print is', result)

<hr>

## 5. Libraries

[Slide #4: What are Libraries?]()

A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module, so don’t worry if you mix them.

A program must import a library module before using it.

In [None]:
import math

print('pi is', math.pi)
print('cos(pi) is', math.cos(math.pi))

Use `help` to learn about the contents of a library module.

In [None]:
help(math)

Import specific items from a library module to shorten programs.

In [None]:
from math import cos, pi

print('cos(pi) is', cos(pi))

Create an alias for a library module when importing it to shorten programs.

In [None]:
import math as m

print('cos(pi) is', m.cos(m.pi))

<hr> 

## 6. Reading Tabular Data into DataFrames


[Slide #5](What is Tabular Data?)

Use the Pandas library to do statistics on tabular data.

In [1]:
import pandas

data = pandas.read_csv('data/gapminder_gdp_oceania.csv')
data

Unnamed: 0,country,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
0,Australia,10039.59564,10949.64959,12217.22686,14526.12465,16788.62948,18334.19751,19477.00928,21888.88903,23424.76683,26997.93657,30687.75473,34435.36744
1,New Zealand,10556.57566,12247.39532,13175.678,14463.91893,16046.03728,16233.7177,17632.4104,19007.19129,18363.32494,21050.41377,23189.80135,25185.00911


Use `index_col` to specify that a column’s values should be used as row headings.

In [3]:
data = pandas.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')
data

Unnamed: 0_level_0,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Australia,10039.59564,10949.64959,12217.22686,14526.12465,16788.62948,18334.19751,19477.00928,21888.88903,23424.76683,26997.93657,30687.75473,34435.36744
New Zealand,10556.57566,12247.39532,13175.678,14463.91893,16046.03728,16233.7177,17632.4104,19007.19129,18363.32494,21050.41377,23189.80135,25185.00911


Use `DataFrame.info` to find out more about a dataframe.

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, Australia to New Zealand
Data columns (total 12 columns):
gdpPercap_1952    2 non-null float64
gdpPercap_1957    2 non-null float64
gdpPercap_1962    2 non-null float64
gdpPercap_1967    2 non-null float64
gdpPercap_1972    2 non-null float64
gdpPercap_1977    2 non-null float64
gdpPercap_1982    2 non-null float64
gdpPercap_1987    2 non-null float64
gdpPercap_1992    2 non-null float64
gdpPercap_1997    2 non-null float64
gdpPercap_2002    2 non-null float64
gdpPercap_2007    2 non-null float64
dtypes: float64(12)
memory usage: 208.0+ bytes


The `DataFrame.columns` variable stores information about the dataframe’s columns.

In [5]:
data.columns

Index(['gdpPercap_1952', 'gdpPercap_1957', 'gdpPercap_1962', 'gdpPercap_1967',
       'gdpPercap_1972', 'gdpPercap_1977', 'gdpPercap_1982', 'gdpPercap_1987',
       'gdpPercap_1992', 'gdpPercap_1997', 'gdpPercap_2002', 'gdpPercap_2007'],
      dtype='object')

Use `DataFrame.T` to transpose a dataframe. (Switch columns and rows)

In [6]:
data.T

country,Australia,New Zealand
gdpPercap_1952,10039.59564,10556.57566
gdpPercap_1957,10949.64959,12247.39532
gdpPercap_1962,12217.22686,13175.678
gdpPercap_1967,14526.12465,14463.91893
gdpPercap_1972,16788.62948,16046.03728
gdpPercap_1977,18334.19751,16233.7177
gdpPercap_1982,19477.00928,17632.4104
gdpPercap_1987,21888.88903,19007.19129
gdpPercap_1992,23424.76683,18363.32494
gdpPercap_1997,26997.93657,21050.41377


Use `DataFrame.describe` to get summary statistics about data.

In [7]:
data.describe()

Unnamed: 0,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
count,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0
mean,10298.08565,11598.522455,12696.45243,14495.02179,16417.33338,17283.957605,18554.70984,20448.04016,20894.045885,24024.17517,26938.77804,29810.188275
std,365.560078,917.644806,677.727301,43.986086,525.09198,1485.263517,1304.328377,2037.668013,3578.979883,4205.533703,5301.85368,6540.991104
min,10039.59564,10949.64959,12217.22686,14463.91893,16046.03728,16233.7177,17632.4104,19007.19129,18363.32494,21050.41377,23189.80135,25185.00911
25%,10168.840645,11274.086022,12456.839645,14479.47036,16231.68533,16758.837652,18093.56012,19727.615725,19628.685413,22537.29447,25064.289695,27497.598692
50%,10298.08565,11598.522455,12696.45243,14495.02179,16417.33338,17283.957605,18554.70984,20448.04016,20894.045885,24024.17517,26938.77804,29810.188275
75%,10427.330655,11922.958888,12936.065215,14510.57322,16602.98143,17809.077557,19015.85956,21168.464595,22159.406358,25511.05587,28813.266385,32122.777857
max,10556.57566,12247.39532,13175.678,14526.12465,16788.62948,18334.19751,19477.00928,21888.88903,23424.76683,26997.93657,30687.75473,34435.36744


<hr> 

## 7. Pandas DataFrames