# Chapter 10

*Modeling and Simulation in Python* Chun San Yip Summer 2021

Copyright 2021 Allen Downey

License: [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-nc-sa/4.0/)

In [1]:
# check if the libraries we need are installed

try:
    import pint
except ImportError:
    !pip install pint
    import pint
    
try:
    from modsim import *
except ImportError:
    !pip install modsimpy
    from modsim import *

Collecting pint
[?25l  Downloading https://files.pythonhosted.org/packages/33/de/53a77b82553579affab7438d299f850acbc1c4dd741c5ce52594513cb0ef/Pint-0.17-py2.py3-none-any.whl (204kB)
[K     |█▋                              | 10kB 16.9MB/s eta 0:00:01[K     |███▏                            | 20kB 22.0MB/s eta 0:00:01[K     |████▉                           | 30kB 25.4MB/s eta 0:00:01[K     |██████▍                         | 40kB 28.1MB/s eta 0:00:01[K     |████████                        | 51kB 31.1MB/s eta 0:00:01[K     |█████████▋                      | 61kB 33.0MB/s eta 0:00:01[K     |███████████▏                    | 71kB 33.6MB/s eta 0:00:01[K     |████████████▉                   | 81kB 30.3MB/s eta 0:00:01[K     |██████████████▍                 | 92kB 31.3MB/s eta 0:00:01[K     |████████████████                | 102kB 29.8MB/s eta 0:00:01[K     |█████████████████▋              | 112kB 29.8MB/s eta 0:00:01[K     |███████████████████▏            | 122kB 29.8MB/

### Under the hood

To get a `DataFrame` and a `Series`, I'll read the world population data and select a column.

`DataFrame` and `Series` contain a variable called `shape` that indicates the number of rows and columns.

In [None]:
import os

filename = 'World_population_estimates.html'

if not os.path.exists(filename):
    !wget https://raw.githubusercontent.com/AllenDowney/ModSimPy/master/data/World_population_estimates.html

In [None]:
from pandas import read_html

tables = read_html(filename, header=0, index_col=0, decimal='M')
table2 = tables[2]
table2.columns = ['census', 'prb', 'un', 'maddison', 
                  'hyde', 'tanton', 'biraben', 'mj', 
                  'thomlinson', 'durand', 'clark']
table2.shape

In [None]:
census = table2.census / 1e9
census.shape

In [None]:
un = table2.un / 1e9
un.shape

A `DataFrame` contains `index`, which labels the rows.  It is an `Int64Index`, which is similar to a NumPy array.

In [None]:
table2.index

And `columns`, which labels the columns.

In [None]:
table2.columns

And `values`, which is an array of values.

In [None]:
table2.values

A `Series` does not have `columns`, but it does have `name`.

In [None]:
census.name

It contains `values`, which is an array.

In [None]:
census.values

And it contains `index`:

In [None]:
census.index

If you ever wonder what kind of object a variable refers to, you can use the `type` function.  The result indicates what type the object is, and the module where that type is defined.

`DataFrame`, `Int64Index`, `Index`, and `Series` are defined by Pandas.

`ndarray` is defined by NumPy.

In [None]:
type(table2)

In [None]:
type(table2.index)

In [None]:
type(table2.columns)

In [None]:
type(table2.values)

In [None]:
type(census)

In [None]:
type(census.index)

In [None]:
type(census.values)

## Optional exercise

The following exercise provides a chance to practice what you have learned so far, and maybe develop a different growth model.  If you feel comfortable with what we have done so far, you might want to give it a try.

**Optional Exercise:** On the Wikipedia page about world population estimates, the first table contains estimates for prehistoric populations.  The following cells process this table and plot some of the results.

In [None]:
filename = 'World_population_estimates.html'
tables = read_html(filename, header=0, index_col=0, decimal='M')
len(tables)

Select `tables[1]`, which is the second table on the page.

In [None]:
table1 = tables[1]
table1.head()

Not all agencies and researchers provided estimates for the same dates.  Again `NaN` is the special value that indicates missing data.

In [None]:
table1.tail()

Again, we'll replace the long column names with more convenient abbreviations.

In [None]:
table1.columns = ['PRB', 'UN', 'Maddison', 'HYDE', 'Tanton', 
                  'Biraben', 'McEvedy & Jones', 'Thomlinson', 'Durand', 'Clark']

Some of the estimates are in a form Pandas doesn't recognize as numbers, but we can coerce them to be numeric.

In [None]:
for col in table1.columns:
    table1[col] = pd.to_numeric(table1[col], errors='coerce')

Here are the results.  Notice that we are working in millions now, not billions.

In [None]:
table1.plot()
decorate(xlim=[-10000, 2000], xlabel='Year', 
         ylabel='World population (millions)',
         title='Prehistoric population estimates')
plt.legend(fontsize='small');

We can use `xlim` to zoom in on everything after Year 0.

In [None]:
table1.plot()
decorate(xlim=[0, 2000], xlabel='Year', 
         ylabel='World population (millions)',
         title='CE population estimates')
plt.legend(fontsize='small');

See if you can find a model that fits these data well from Year 0 to 1950.

How well does your best model predict actual population growth from 1950 to the present?

In [None]:
# Solution goes here

In [None]:
# Solution goes here