# CIV1538: A Crash Course in Python for Transportation Demand Analysis

Python is a general purpose programming language, which is popular among data scientists and includes a variety of useful packages for Transportation Demand Analysts. It is an interpretative language, rather than compiled, so can run slightly slower in CPU time. However, its ease of use means it often takes less time in learning and coding.

We will use the Anaconda distribrution of Python. This distribution is very popular among data scientists and includes: Python, the Spyder IDE, Jupyter notebooks, and RStudio.

The Anaconda software can be downloaded as an executable file (for Windows and Mac) from their website: https://www.anaconda.com/

Make sure you download a version that includes Python 3.7+ (required by Biogeme). This should be the default distribution provided with Anaconda. During the installation process, make sure that it is selected to 'Add Anaconda to my PATH environment variable' and 'Register Anaconda as my default Python 3.x'

# Python Data Structures

**Lists** – Lists are one of the most versatile data structure in Python. A list can simply be defined by writing a list of comma separated values in square brackets. Lists might contain items of different types, but usually the items all have the same type. Python lists are mutable - meaning individual elements of a list can be changed.

In [1]:
my_list = [0,1,4,5,6,9,16,24]

In [2]:
my_list

[0, 1, 4, 5, 6, 9, 16, 24]

Individual elements of a list can be accessed by their index number.

In [3]:
my_list[0] # Give me the 0th element of the list

0

A range of elements can be accessed using the colon operator.

In [4]:
my_list[1:3] # Give me the 1st and 2nd elements of the list

[1, 4]

Negative indices can be used to access elements of the list from the end.

In [5]:
my_list[-2] # Give me the 2nd last element of the list

16

**Strings** – Strings can be defined simply by use of single ( ‘ ), double ( ” ), or triple ( """ ) quotation marks. Strings enclosed in tripe quotes ( """ ) can span over multiple lines and are used frequently in docstrings (Python’s way of documenting functions). \ is used as an escape character. Please note that Python strings are immutable, so you can not change part of a string. Elements of strings can be accessed similar to lists: they are essentially lists of characters.

In [6]:
my_str = 'Hello'
print(my_str[1]) # Print the 1st character of my_str
print(len(my_str)) # Use the len() function to return the number of characters in my_str
print(my_str, ' CIV1538 class') # Combine variable my_len and a string in a print statement


e
5
Hello  CIV1538 class


**Tuples** – A tuple is represented by a number of values separated by commas. Tuples are immutable and the output is surrounded by parentheses so that nested tuples are processed correctly. Additionally, even though tuples are immutable, they can hold mutable data if needed.
Since Tuples are immutable and can not change, they are faster in processing as compared to lists. Hence, if your list is unlikely to change, you should use tuples, instead of lists.

**Dictionary** – Dictionary is an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}. 

# Loops and Conditional Constructs
Like most languages, Python also has a FOR-loop which is the most widely used method for iteration. It has a simple syntax:

In [10]:
for_list = ['Iterate', 'through', 'the', 'elements', 'of', 'this', 'list.']
for i in for_list:
  print(i)

Iterate
through
the
elements
of
this
list.


Coming to conditional statements, these are used to execute code fragments based on a condition. The most commonly used construct is if-else, with following syntax:

# Python Libraries
Lets take one step ahead in our journey to learn Python by getting acquainted with some useful libraries. The first step is obviously to learn to import them into our environment. There are several ways of doing so in Python:

Following are a list of libraries, you will find useful for any scientific computations and data analysis (most are already included in Anaconda distributions of Python):

**NumPy** stands for Numerical Python. The most powerful feature of NumPy is n-dimensional array. This library also contains basic linear algebra functions, Fourier transforms,  advanced random number capabilities and tools for integration with other low level languages like Fortran, C and C++

**SciPy** stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices.

**Matplotlib** for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot.

**Pandas** for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Python’s usage in data scientist community.

**Scikit Learn** for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of effiecient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.

**Statsmodels** for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.

**Biogeme**

## References
https://personalpages.manchester.ac.uk/staff/stefan.guettel/py/getting_started.pdf