# Intro to Python: Data Analysis
Part of the workshop series presented by the [IDEA Student Center at UC San Diego](http://www.jacobsschool.ucsd.edu/student/).

### Goals
Learn the basics of Python (the programming language) for data analysis:
- loading data from a file
- plotting
- vectorized calculations

### Requirements
- numpy = for loading data and vectorized calculations
- matplotlib = for plotting

In [None]:
# make the code compatible with both Python 2 and 3
from __future__ import print_function, division

## 1) Loading packages
Python is a general purpose language, which means it can be used for
a wide variety of problems. However, that also means it is not set up
automatically for engineering tasks (i.e. it's more like C/C++ than Matlab). Fortunately, Python has a great community and a large number
of third-party packages have been created to allow users to solve engineering problems with Python (instead of Matlab). The two packages we'll focus on (which will help recreate most Matlab functionality) are:
- [NumPy](http://www.numpy.org/)
- [Matplotlib](http://matplotlib.org/)

But first, we need to load these packages.

In [None]:
# to load a package in Python (assuming it's installed), use the built-in
# ``import`` command

# load NumPy
import numpy

# check the version of numpy
print( numpy.__version__ )

The numpy package has a lot of built-in code for performing common math-related programming tasks. Let's try a few out:

In [None]:
# the value of pi
print( numpy.pi )

In [None]:
# cosine function
print( numpy.cos(0) )

In [None]:
# create a vector of ones
print( numpy.ones(5) )

At this point, you may already be tired of typing ``numpy`` over and over. Well, you're not the only one. Which is why Python's ``import`` command has extra options, including the ability to assign an alias for a package name.

In [None]:
# load the numpy package, but refer to it by the alias ``np``
import numpy as np

# now whenever you type ``np``, Python knows you mean ``numpy``

# check the numpy version
print( np.__version__ )

# the value of pi
print( np.pi )

# sin(pi / 2)
print( np.sin(np.pi / 2.0) )

# Aaaawwww, so much nicer ^__^

## 2) Plotting data
Now let's try plotting something. First we'll load the plotting package (matplotlib) and then we'll plot some sample data.

In [None]:
# load a plotting library
import matplotlib.pyplot as plt

# make figures show up inside the notebook (instead of in
# a separate window)
%matplotlib inline

In [None]:
# a simple line plot

# create some "fake" data
x = [0, 1, 2, 3, 4, 5]
y = [0, 1, 4, 9, 16, 25]

# plot the data
plt.plot(x, y)

# show the plot
plt.show()

In [None]:
# let's do that again, but this time we'll try to make the plot look a
# bit more professional by changing the colors, line thicknesses, etc.

# create some "fake" data
x = [0, 1, 2, 3, 4, 5]
y = [0, 1, 4, 9, 16, 25]

# plot the data
plt.plot(x, y, color='black', linewidth=2.0)

# add a label to the x-axis
plt.xlabel('x')

# add a label to the y-axis
plt.ylabel('???')

# show the plot
plt.show()

# try to further improve the plot:
# - try other built-in colors (red, blue, green, etc.)
# - or try using a custom color using HEX codes (e.g. "#16a9c7" is a blue-ish color)
# - change the range of values shown on the y-axis with ``plt.ylim([y_min, y_max])``
# - change the linestyle of the plot with ``linestyle='--'`` (dashed) or ``linestyle=':'`` (dotted)
# - change the fontsize of the x-axis label with ``plt.xlabel('x', fontsize=18)``
#
# is there anything else you want to try changing?
#

In [None]:
# matplotlib has other plot types built-in

# create some random data
# - pressure [atm]
# - temperature [K]
pressure = [1, 2, 3, 4, 5]
temperature = [273, 303, 310, 350, 365]

# example: scatter plot
plt.scatter(pressure, temperature)
plt.show()

# example: bar plot
#plt.bar(???, ???)
#plt.show()

# NOTE: each plot type has its own set of possible customizations, but
# there is overlap between them (e.g. you can change the color of the bar plot
# with ``color='red'``)
#

## 3) Loading data files
Now that we can load packages and plot, let's try working with some data.

### 3.1) Ice cream sales vs shark attacks
Let's compare data on ice cream sales and shark attacks.

In [None]:
# to make things simple, we're going to use numpy to load the data

# load the data file
# - first column = number of ice cream sales
# - second column = number of shark attacks
#
data = np.genfromtxt('ice_cream_vs_shark_attacks.csv',
                     # columns are separate by commas (csv = comma separated values)
                     delimiter=',',
                     
                     # the first row of the file is the column names
                     # (i.e. the first row isn't data and so we can skip it)
                     skip_header=1)

# as a one liner
#data = np.genfromtxt('ice_cream_vs_shark_attacks.csv', delimiter=',', skip_header=1)

In [None]:
# view the data
print( data )

In [None]:
# check the length of the data
print( len(data) )

In [None]:
# NOTE: numpy has a built-in parameter to see the dimensions of
# the array (very useful is you're working with 2D, 3D, etc. data)
print( data.shape )

In [None]:
# check the type used to store the data
print( type(data) )

In [None]:
# to make things easier, we're going to split the data into
# two variables

# select all rows, but only the first column (column 0)
ice_cream = data[:, 0]

# select all rows, but only the second column (column 1)
shark_attacks = data[:, ???]

In [None]:
# create a quick plot of the data

# let's do a scatter plot
plt.scatter(ice_cream, ???)

# add labels to the x and y axes
plt.xlabel('???')
plt.ylabel('???')

# show the plot
plt.show()

**Question**: Does this mean ice cream makes people delicious to shark attacks? Or that shark attacks cause you have a craving for Ben & Jerry's?

### 3.2) Lab data
Now let's try a more realistic example to what you would do in an engineering lab class.

In [None]:
# first, we'll load a data file that has the results from an experiment

# consumption rate of oxygen during combustion
#
# load experimental data from a file (2 columns)
# - column 0 = temperature [K]
# - column 1 = consumption rate [moles/(s * m^2)]
#
data = np.genfromtxt('combustion.csv', delimiter=',', skip_header=1)

# temperature [K]
T = data[:, 0]

# consumption rate (J) [kmoles/(s * m^2)]
J = data[:, 1]

In [None]:
# plot the temperature vs consumption rate of oxygen

# use scatter plot since these are data points from an 
# experiment
plt.scatter(T, J, color='black')

# add labels
plt.xlabel('???')
plt.ylabel('???')

# show the plot
plt.show()

Now let's compare the experimental results (the data we just loaded and plotted) to a theoretical model.

**NOTE**: due to time constraints, we will only compare the theoretical and experimental results visually. In a real scenario, you should use statistical metrics to determine how well the theory and data agree.

The consumption rate of oxygen for this experimental setup is described
by the following equation:

$$
J = c D_{12} \frac{1}{r_0} ,
$$

where $D_{12}$ = 1.71E-4 [m$^2$/s], $r_0$ = 0.001 [m] (i.e. 1 mm), and $c$ [mol/m$^3$] is calculated as:

$$
c = \frac{p}{R T} ,
$$
where $p$ is the pressure (10135.0 [Pa]), $R$ = 8.314 J/(K * mol), and $T$ [K] is the temperature.

In [None]:
# knowns:
# - p = 10135.0 Pa
# - R = 8.314 J/(K*mol)
# - r0 = 0.001 m
# - D12 = 1.71E-4 m^2/s

# need to calculate:
# - c in [moles/m^3] for a range of temperatures [K]
# - J in [moles/(m^2 * s)]

In [None]:
# define the knowns as variables:

# pressure [Pa]
p = 10135.0

# ideal gas constant [J/(mol*K)]
R = 8.314

# radius [m]
r0 = 0.001

# mass diffusivity [m^2/s]
D12 = 1.71E-4

In [None]:
# calculate c [kmol/m^3] for a range of temperatures [K]

# create a vector of temperatures [K]
#
#   np.arange(min value, max value, step size)
#
T_theory = np.arange(1200.0, ???, 100.0)

# calculate c for the range of temperatures
c = p / (R * T_theory)

# then calculate J for a range of temperatures
# using the equations above
J_theory = ???

# check the results with a plot
plt.plot(T_theory, J_theory)
plt.show()

In [None]:
# now let's plot the theoretical results against the experimental

# plot the experimental results
plt.scatter(T, J, color='red')

# plot the theoretical results
plt.plot(???, ???, color='black')

# show the plot
plt.show()


# now that we have a basic plot, try to improve the formatting
# - colors
# - line styles and thicknesses
# - text labels
# - font sizes
#