<a href="https://colab.research.google.com/github/stb2145/cig/blob/master/Week_5_Salinity_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Week 2: Pandas, NumPy, SciPy**

**Learning Goals**
- Learn about/ `import` [Python libraries](http://swcarpentry.github.io/python-novice-gapminder/06-libraries/index.html) (20 min)
- Learn about [Pandas](http://swcarpentry.github.io/python-novice-gapminder/07-reading-tabular/index.html) (40 min)
- Learn about SciPy (30 min)

### **Icebreaker!**
> **How many oceans are there?** 🤔...

##### Sort of a trick question...

# **Solution to Week 1 exercises**



# **What are Python Libraries?**
##Most of the power of a programming language is in its libraries.

> A Python package is a collection/directory of Python modules. In other words, it's a library of python files and in those files are scripts of code with specific functions.

<img src='https://drive.google.com/uc?id=1C7Y1p1Nlj0QhLqEPUGMoU6FP1QWMqrFU' width="520" height="300" />

- A library is a collection of files (called modules) that contains functions for use by other programs.
 - May also contain data values (e.g., numerical constants) and other things.
 - Library’s contents are supposed to be related, but there’s no way to enforce that.
- The Python [standard library](https://docs.python.org/3/library/) is an extensive suite of modules that comes with Python itself.
- Many additional libraries are available from [PyPI](https://pypi.org/) (the Python Package Index).
- We will see later how to write new libraries.



> **Libraries and Modules**
>
> A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module, so don’t worry if you mix them.

## A program must import a library module before using it.

- Use `import` to load a library module into a program’s memory.
- Then refer to things from the module as `module_name.thing_name`.
 - Python uses `.` to mean “part of”.
- Using `numpy`, one of the modules in the standard library:

In [None]:
import numpy

print('pi is', numpy.pi)
print('cos(pi) is', numpy.cos(numpy.pi))

> Have to refer to each item with the module’s name.
>> `numpy.cos(pi)` won’t work: the reference to `pi` doesn’t somehow “inherit” the function’s reference to `numpy`

## Use `help` to learn about the contents of a library module.

> [Numpy Documentation](https://numpy.org/doc/)

In [None]:
import math
help(math)

### Difference between `math` and `numpy`
[from StackOverflow](https://stackoverflow.com/questions/41648058/what-is-the-difference-between-import-numpy-and-import-math)

- Use `math` if you are doing simple comutations with only with scalars (and no lists or arrays).
> `math` is part of the standard python library. It provides functions for basic mathematical operations as well as some commonly used constants.

- Use `numpy` if you are doing scientific computations with matrices, arrays, or large datasets.
> numpy on the other hand is a third party package geared towards scientific computing. It is the defacto package for numerical and vector operations in python. It provides several routines optimized for vector and array computations as a result, is a lot faster for such operations than say just using python lists. See http://www.numpy.org/ for more info.

## Import specific items from a library module to shorten programs.

In [None]:
from numpy import cos, pi

print('cos(pi) is', cos(pi))

## Create an alias for a library module when importing it to shorten programs.

> Use `import ... as ...` to give a library a short alias while importing it.
>
> Then refer to items in the library using that shortened name.

In [None]:
import numpy as np

print('cos(pi) is', np.cos(np.pi))

### Essential Python libraries:

- `numpy`
- `pandas`
- `matplotlib`

### Create alias for these libraries:

In [2]:
import numpy as np
import pandas as pd
import scipy 

## A few exercises:

## 1)

When a colleague of yours types `help(math)`, Python reports an error:

`NameError: name 'math' is not defined`

What has your colleague forgotten to do?

In [None]:
#type solution here


## 2)
Take the square root of a 4x4 2-D array with the number 144 in the diagonals and 0's elsewhere.

In [None]:
a = 144 * np.eye()
np.sqrt(a)

## 3)
Create a 1D array of numbers going from 0 to 20 with 2 as the step count.

What is the length of this object?

In [None]:
x = np.arange(0, 20, 2)
len(x)

# **Pandas!**

<img width="400" src='https://miro.medium.com/max/1400/1*KdxlBR9P3mDp9JZ_URMdYQ.jpeg'>

## No but seriously, `pandas` is a powerful Python library that allows for efficient, high-performing analysis (typically statistics) on _tabular_ data (i.e. excel sheet type of data).

> Pandas is built on top of the Numpy library, which in practice means that most of the methods defined for Numpy Arrays apply to Pandas Series/DataFrames.

### Pandas Capabilities

[Documentation](https://pandas.pydata.org/pandas-docs/stable/)



- A fast and efficient DataFrame object for data manipulation with integrated indexing;
- Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
- Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
- Flexible reshaping and pivoting of data sets;
- Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;
- Columns can be inserted and deleted from data structures for size mutability;
- Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets;
- High performance merging and joining of data sets;
- Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure;
- Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data;
- Highly optimized for performance, with critical code paths written in Cython or C.
- Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.

### **There are two main data structures in pandas**

> **Data Series**: 1-dimensional array of values with an index
>
> **Data Frame**: 2-dimensional array of values with a row and a column index

> A DataFrame is a collection of Series; The DataFrame is the way Pandas represents a table, and Series is the data-structure Pandas use to represent a column.


## Data series (left), Data Frame (right)

Introduce pd frame and series with class age and height, with the names as index and age/height as values.

<img width="500" src='https://miro.medium.com/max/1400/1*o5c599ueURBTZWDGmx1SiA.png'>

> Anytime you need more information on a package/function, call `?` after the function name.

In [None]:
pd?

## Pandas Data Series:

In [None]:
#create index
names = ['Shanice', 'Jonathan', 'Rahim']
#create data values
ages = [121, 90, 83]
#height = []

In [None]:
biopy = #create pandas series

> You can use many statistical functions on both Series and DataFrames.

In [None]:
#oldest age in your series
club.max()

In [None]:
#youngest age value in your series
club.min()

In [None]:
#average age in the whole club
club.mean()

> Ocean basins pandas series

In [None]:
ocean_basins = ['Arctic', 'Atlantic', 'Indian', 'Pacific', 'Southern']
avg_salinity = [32, 35, 34.5, 35, 34.7]
ds = pd.Series(data=avg_salinity, index=ocean_basins, name="Ocean basins' average salinities")

In [None]:
ds

In [None]:
# If you're not sure what the index of your pd series is:
ds.index

In [None]:
# If you're not sure what the values of your pd series are:
ds.values

> Find the freshest ocean basin(s)

In [None]:
#first find the minimum salinity value
ds.min()

In [None]:
#next find the index associated with that salinity value
ds[ds == 32.0]

In [None]:
#another way to write the same code!
ds[ds == ds.min()]

> Find the saltiest ocean basin(s)

In [None]:
#your code here

In [None]:
#your code here

## Pandas Data Frame:

In [19]:
#first create a dictionary
ocean_basins = ['Arctic', 'Atlantic', 'Indian', 'Pacific', 'Southern']
avg_salinity = [32, 35, 34.5, 35, 34.7]
avg_temp = [-1.8, 14, 22, 20, 4]

avg_data = {'avg_salinity': avg_salinity,
        'avg_temp': avg_temp}


df = pd.DataFrame(data=avg_data, index=ocean_basins)

In [20]:
df

Unnamed: 0,avg_salinity,avg_temp
Arctic,32.0,-1.8
Atlantic,35.0,14.0
Indian,34.5,22.0
Pacific,35.0,20.0
Southern,34.7,4.0


In [None]:
df.info()

> You can use many statistical functions on both Series and DataFrames.

In [None]:
df.min()

In [None]:
df.max()

In [None]:
df.mean()

> Or, if you want all the basic stats, you can call `describe()`


In [None]:
df.describe()

> We can get a single column as a Series using python's getitem syntax on the DataFrame object.

In [None]:
df['avg_salinity']

> or using attribute syntax.

In [None]:
df.avg_salinity

## Indexing & Slicing

- Use `DataFrame.iloc[..., ...]` to select values by their (entry)  **position**

- Use `DataFrame.loc[..., ...]` to select values by their (entry) **label**

In [5]:
df.loc['Southern']

avg_salinity    34.7
avg_temp         4.0
Name: Southern, dtype: float64

In [6]:
df.iloc[-1]

avg_salinity    34.7
avg_temp         4.0
Name: Southern, dtype: float64

> we can also specify the column we want to access

In [7]:
df.loc['Southern', 'avg_temp']

4.0

In [8]:
df.iloc[-1,1]

4.0

> If we make a calculation using columns from the DataFrame, it will keep the same index:

In [None]:
df.avg_salinity * df.avg_temp

> Which we can easily add as another column to the DataFrame:

In [28]:
df['TS'] = df.avg_salinity * df.avg_temp

In [29]:
df.TS.mean()

406.04

In [30]:
df

Unnamed: 0,avg_salinity,avg_temp,TS
Arctic,32.0,-1.8,-57.6
Atlantic,35.0,14.0,490.0
Indian,34.5,22.0,759.0
Pacific,35.0,20.0,700.0
Southern,34.7,4.0,138.8


> Now let's add a row to the Dataframe:

In [35]:
#create new DataFrame object of global averages
values = {'avg_salinity':df.avg_salinity.mean(), 
          'avg_temp':df.avg_temp.mean(), 'TS':df.TS.mean()}
index = ['Global']
globe = pd.DataFrame(data=values, index=index)
globe

Unnamed: 0,avg_salinity,avg_temp,TS
Global,34.24,11.6,406.04


In [40]:
df_new = df.append(globe)

In [41]:
df_new

Unnamed: 0,avg_salinity,avg_temp,TS
Arctic,32.0,-1.8,-57.6
Atlantic,35.0,14.0,490.0
Indian,34.5,22.0,759.0
Pacific,35.0,20.0,700.0
Southern,34.7,4.0,138.8
Global,34.24,11.6,406.04


> **Find the following in this dataframe:**

In [None]:
#What ocean basin has the coldest average temperature? What is that temperature?
df.avg_temp.min()

In [None]:
df.avg_salinity.plot(kind='bar')

In [None]:
df.plot(kind='bar')

## A few exercises: GDP per capita in Europe

In [52]:
url = 'https://raw.githubusercontent.com/swcarpentry/python-novice-gapminder/gh-pages/data/gapminder_gdp_europe.csv'
gdp = pd.read_csv(url, index_col='country')

In [48]:
gdp.head(10)

Unnamed: 0_level_0,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Albania,1601.056136,1942.284244,2312.888958,2760.196931,3313.422188,3533.00391,3630.880722,3738.932735,2497.437901,3193.054604,4604.211737,5937.029526
Austria,6137.076492,8842.59803,10750.72111,12834.6024,16661.6256,19749.4223,21597.08362,23687.82607,27042.01868,29095.92066,32417.60769,36126.4927
Belgium,8343.105127,9714.960623,10991.20676,13149.04119,16672.14356,19117.97448,20979.84589,22525.56308,25575.57069,27561.19663,30485.88375,33692.60508
Bosnia and Herzegovina,973.533195,1353.989176,1709.683679,2172.352423,2860.16975,3528.481305,4126.613157,4314.114757,2546.781445,4766.355904,6018.975239,7446.298803
Bulgaria,2444.286648,3008.670727,4254.337839,5577.0028,6597.494398,7612.240438,8224.191647,8239.854824,6302.623438,5970.38876,7696.777725,10680.79282
Croatia,3119.23652,4338.231617,5477.890018,6960.297861,9164.090127,11305.38517,13221.82184,13822.58394,8447.794873,9875.604515,11628.38895,14619.22272
Czech Republic,6876.14025,8256.343918,10136.86713,11399.44489,13108.4536,14800.16062,15377.22855,16310.4434,14297.02122,16048.51424,17596.21022,22833.30851
Denmark,9692.385245,11099.65935,13583.31351,15937.21123,18866.20721,20422.9015,21688.04048,25116.17581,26406.73985,29804.34567,32166.50006,35278.41874
Finland,6424.519071,7545.415386,9371.842561,10921.63626,14358.8759,15605.42283,18533.15761,21141.01223,20647.16499,23723.9502,28204.59057,33207.0844
France,7029.809327,8662.834898,10560.48553,12999.91766,16107.19171,18292.63514,20293.89746,22066.44214,24703.79615,25889.78487,28926.03234,30470.0167


In [53]:
# Select Austria by entry position
gdp.iloc[1]

gdpPercap_1952     6137.076492
gdpPercap_1957     8842.598030
gdpPercap_1962    10750.721110
gdpPercap_1967    12834.602400
gdpPercap_1972    16661.625600
gdpPercap_1977    19749.422300
gdpPercap_1982    21597.083620
gdpPercap_1987    23687.826070
gdpPercap_1992    27042.018680
gdpPercap_1997    29095.920660
gdpPercap_2002    32417.607690
gdpPercap_2007    36126.492700
Name: Austria, dtype: float64

In [55]:
# Select Austria by entry label
gdp.loc['Austria']

gdpPercap_1952     6137.076492
gdpPercap_1957     8842.598030
gdpPercap_1962    10750.721110
gdpPercap_1967    12834.602400
gdpPercap_1972    16661.625600
gdpPercap_1977    19749.422300
gdpPercap_1982    21597.083620
gdpPercap_1987    23687.826070
gdpPercap_1992    27042.018680
gdpPercap_1997    29095.920660
gdpPercap_2002    32417.607690
gdpPercap_2007    36126.492700
Name: Austria, dtype: float64

In [56]:
# Select/slice to all the rows in Denmark (two ways to do this)
gdp.loc['Denmark', :]

gdpPercap_1952     9692.385245
gdpPercap_1957    11099.659350
gdpPercap_1962    13583.313510
gdpPercap_1967    15937.211230
gdpPercap_1972    18866.207210
gdpPercap_1977    20422.901500
gdpPercap_1982    21688.040480
gdpPercap_1987    25116.175810
gdpPercap_1992    26406.739850
gdpPercap_1997    29804.345670
gdpPercap_2002    32166.500060
gdpPercap_2007    35278.418740
Name: Denmark, dtype: float64

In [57]:
gdp.loc['Denmark']

gdpPercap_1952     9692.385245
gdpPercap_1957    11099.659350
gdpPercap_1962    13583.313510
gdpPercap_1967    15937.211230
gdpPercap_1972    18866.207210
gdpPercap_1977    20422.901500
gdpPercap_1982    21688.040480
gdpPercap_1987    25116.175810
gdpPercap_1992    26406.739850
gdpPercap_1997    29804.345670
gdpPercap_2002    32166.500060
gdpPercap_2007    35278.418740
Name: Denmark, dtype: float64

In [58]:
# Select/slice to all the countries/rows in the 4th column (two ways to do this)
gdp.iloc[:, 3]

country
Albania                    2760.196931
Austria                   12834.602400
Belgium                   13149.041190
Bosnia and Herzegovina     2172.352423
Bulgaria                   5577.002800
Croatia                    6960.297861
Czech Republic            11399.444890
Denmark                   15937.211230
Finland                   10921.636260
France                    12999.917660
Germany                   14745.625610
Greece                     8513.097016
Hungary                    9326.644670
Iceland                   13319.895680
Ireland                    7655.568963
Italy                     10022.401310
Montenegro                 5907.850937
Netherlands               15363.251360
Norway                    16361.876470
Poland                     6557.152776
Portugal                   6361.517993
Romania                    6470.866545
Serbia                     7991.707066
Slovak Republic            8412.902397
Slovenia                   9405.489397
Spain            

In [61]:
gdp['gdpPercap_1967']

country
Albania                    2760.196931
Austria                   12834.602400
Belgium                   13149.041190
Bosnia and Herzegovina     2172.352423
Bulgaria                   5577.002800
Croatia                    6960.297861
Czech Republic            11399.444890
Denmark                   15937.211230
Finland                   10921.636260
France                    12999.917660
Germany                   14745.625610
Greece                     8513.097016
Hungary                    9326.644670
Iceland                   13319.895680
Ireland                    7655.568963
Italy                     10022.401310
Montenegro                 5907.850937
Netherlands               15363.251360
Norway                    16361.876470
Poland                     6557.152776
Portugal                   6361.517993
Romania                    6470.866545
Serbia                     7991.707066
Slovak Republic            8412.902397
Slovenia                   9405.489397
Spain            

In [63]:
# Select multiple columns or rows using .loc and a named slice
gdp.iloc[20:25, 0:4]

Unnamed: 0_level_0,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Portugal,3068.319867,3774.571743,4727.954889,6361.517993
Romania,3144.613186,3943.370225,4734.997586,6470.866545
Serbia,3581.459448,4981.090891,6289.629157,7991.707066
Slovak Republic,5074.659104,6093.26298,7481.107598,8412.902397
Slovenia,4215.041741,5862.276629,7402.303395,9405.489397


> Can do same statistical operations on the slices

In [66]:
gdp.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']

Unnamed: 0_level_0,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Italy,8243.58234,10022.40131,12269.27378
Montenegro,4649.593785,5907.850937,7778.414017
Netherlands,12790.84956,15363.25136,18794.74567
Norway,13450.40151,16361.87647,18965.05551
Poland,5338.752143,6557.152776,8006.506993


In [67]:
# Find the maximum gdp values within Italy through Poland, 1962-1972
print(gdp.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].max())

gdpPercap_1962    13450.40151
gdpPercap_1967    16361.87647
gdpPercap_1972    18965.05551
dtype: float64


In [68]:
# Find the minimum gdp values within Italy through Poland, 1962-1972
print(gdp.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'].min())

gdpPercap_1962    4649.593785
gdpPercap_1967    5907.850937
gdpPercap_1972    7778.414017
dtype: float64


>Use comparisons to select data based on value.
  - Comparison is applied element by element.
  - Returns a similarly-shaped dataframe of `True` and `False`.

In [69]:
# Use a subset of data to keep output readable.
subset = gdp.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972']
print('Subset of data:\n', subset)

# Which values were greater than 10000 ?
print('\nWhere are values large?\n', subset > 10000)

Subset of data:
              gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Italy           8243.582340    10022.401310    12269.273780
Montenegro      4649.593785     5907.850937     7778.414017
Netherlands    12790.849560    15363.251360    18794.745670
Norway         13450.401510    16361.876470    18965.055510
Poland          5338.752143     6557.152776     8006.506993

Where are values large?
              gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Italy                 False            True            True
Montenegro            False           False           False
Netherlands            True            True            True
Norway                 True            True            True
Poland                False           False           False


> Select values or NaN using a Boolean mask.
  - A frame full of Booleans is sometimes called a mask because of how it can be used.


In [70]:
mask = subset > 10000
print(subset[mask])

             gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
country                                                    
Italy                   NaN     10022.40131     12269.27378
Montenegro              NaN             NaN             NaN
Netherlands     12790.84956     15363.25136     18794.74567
Norway          13450.40151     16361.87647     18965.05551
Poland                  NaN             NaN             NaN


- Get the value where the mask is true, and NaN (Not a Number) where it is false.
- Useful because NaNs are ignored by operations like max, min, average, etc.


In [71]:
print(subset[subset > 10000].describe())

       gdpPercap_1962  gdpPercap_1967  gdpPercap_1972
count        2.000000        3.000000        3.000000
mean     13120.625535    13915.843047    16676.358320
std        466.373656     3408.589070     3817.597015
min      12790.849560    10022.401310    12269.273780
25%      12955.737547    12692.826335    15532.009725
50%      13120.625535    15363.251360    18794.745670
75%      13285.513523    15862.563915    18879.900590
max      13450.401510    16361.876470    18965.055510


### Group By: split-apply-combine

>Pandas vectorizing methods and grouping operations are features that provide users much flexibility to analyse their data.

>For instance, let’s say we want to have a clearer view on how the European countries split themselves according to their GDP.
  - We may have a glance by splitting the countries in two groups during the years surveyed, those who presented a GDP higher than the European average and those with a lower GDP.
  - We then estimate a wealthy score based on the historical (from 1962 to 2007) values, where we account how many times a country has participated in the groups of lower or higher GDP

In [72]:
mask_higher = gdp > gdp.mean()
wealth_score = mask_higher.aggregate('sum', axis=1) / len(gdp.columns)
wealth_score

country
Albania                   0.000000
Austria                   1.000000
Belgium                   1.000000
Bosnia and Herzegovina    0.000000
Bulgaria                  0.000000
Croatia                   0.000000
Czech Republic            0.500000
Denmark                   1.000000
Finland                   1.000000
France                    1.000000
Germany                   1.000000
Greece                    0.333333
Hungary                   0.000000
Iceland                   1.000000
Ireland                   0.333333
Italy                     0.500000
Montenegro                0.000000
Netherlands               1.000000
Norway                    1.000000
Poland                    0.000000
Portugal                  0.000000
Romania                   0.000000
Serbia                    0.000000
Slovak Republic           0.000000
Slovenia                  0.333333
Spain                     0.333333
Sweden                    1.000000
Switzerland               1.000000
Turkey      

> Finally, for each group in the wealth_score table, we sum their (financial) contribution across the years surveyed using chained methods:

In [74]:
gdp.groupby(wealth_score).sum()

Unnamed: 0,gdpPercap_1952,gdpPercap_1957,gdpPercap_1962,gdpPercap_1967,gdpPercap_1972,gdpPercap_1977,gdpPercap_1982,gdpPercap_1987,gdpPercap_1992,gdpPercap_1997,gdpPercap_2002,gdpPercap_2007
0.0,36916.8542,46110.918793,56850.065437,71324.848786,88569.346898,104459.358438,113553.768507,119649.599409,92380.047256,103772.937598,118590.929863,149577.357928
0.333333,16790.046878,20942.4568,25744.935321,33567.66767,45277.839976,53860.45675,59679.63402,64436.91296,67918.09322,80876.05158,102086.79521,122803.72952
0.5,11807.544405,14505.00015,18380.44947,21421.8462,25377.72738,29056.14537,31914.71205,35517.67822,36310.66608,40723.5387,45564.30839,51403.02821
1.0,104317.27756,127332.008735,149989.154201,178000.35004,215162.34314,241143.41273,263388.78196,296825.13121,315238.23597,346930.92617,385109.93921,427850.33342


# **SciPy**

## is a Python-based ecosystem of open-source software for mathematics, science, and engineering. In particular, these are some of the core packages: 
  - SciPy
  - NumPy
  - Pandas
  - Matplotlib
  - SymPy

Check out their [User Guide](https://docs.scipy.org/doc/scipy/reference/tutorial/index.html)