# Lab 06 Growing Containers

You can:

`"Blueberry" + "Pie"`  

to get 

`"BlueberryPie"`

or even:

```python
pie = "Blueberry"
pie += "Pie"
```

And the same operations are good for `tuples`:

`(2, 4) + (6, 8)` 

to get 

`(2, 4, 6, 8)`

and

```python
even = (2, 4)
even += (6, 8)
```

but you did not *grow* your original containers, you made new containers.

You can't change a `str` or a `tuple` because these types are *immutable*.

## `list.append`,  and `list.extend`

In [1]:
my_list = [8, 3, 10, 2]

In [2]:
my_list.append(100)

In [3]:
my_list

[8, 3, 10, 2, 100]

In [4]:
my_list.extend([1000,2000])

In [5]:
my_list

[8, 3, 10, 2, 100, 1000, 2000]

In [6]:
my_list.append("Avocado")

In [7]:
my_list

[8, 3, 10, 2, 100, 1000, 2000, 'Avocado']

In [8]:
my_list.extend("Bacon")
my_list

[8, 3, 10, 2, 100, 1000, 2000, 'Avocado', 'B', 'a', 'c', 'o', 'n']

Note the error:

In [9]: my_list.extend(1.1)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-08516d9f908d> in <module>()
----> 1 my_list.extend(1.1)

TypeError: 'float' object is not iterable


## ```list += ``` An Iterable

In [9]:
my_list = [8, 3, 10, 2]
my_list += [10000, 20000]
my_list

[8, 3, 10, 2, 10000, 20000]

In [10]:
my_list += 'Crab'
my_list

[8, 3, 10, 2, 10000, 20000, 'C', 'r', 'a', 'b']

Note the error:

```python
In [11]:  my_list += 3

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-58-3a8668037cd3> in <module>()
----> 1 my_list += 3

TypeError: 'int' object is not iterable

```

In [11]:
my_list += 3,                     # The *comma* is what matters.

In [12]:
my_list

[8, 3, 10, 2, 10000, 20000, 'C', 'r', 'a', 'b', 3]

## `list.insert`

In [13]:
my_list.insert(4, 'x')
my_list

[8, 3, 10, 2, 'x', 10000, 20000, 'C', 'r', 'a', 'b', 3]

## Numpy `concatenate` and `append`

In [14]:
import numpy as np
numbers = np.array([[1, 2], [3, 4]])
numbers

array([[1, 2],
       [3, 4]])

In [15]:
big_numbers = np.array([[100,200],[300,400]])
big_numbers

array([[100, 200],
       [300, 400]])

In [16]:
np.concatenate((numbers, big_numbers))

array([[  1,   2],
       [  3,   4],
       [100, 200],
       [300, 400]])

In [17]:
?np.append

```?np.append

Signature: np.append(arr, values, axis=None)
Docstring:
Append values to the end of an array.

Parameters
----------
arr : array_like
    Values are appended to a copy of this array.
values : array_like
    These values are appended to a copy of `arr`.  It must be of the
    correct shape (the same shape as `arr`, excluding `axis`).  If
    `axis` is not specified, `values` can be any shape and will be
    flattened before use.
axis : int, optional
    The axis along which `values` are appended.  If `axis` is not
    given, both `arr` and `values` are flattened before use.

Returns
-------
append : ndarray
    A copy of `arr` with `values` appended to `axis`.  Note that
    `append` does not occur in-place: a new array is allocated and
    filled.  If `axis` is None, `out` is a flattened array.

... more details are given.
```

In [18]:
np.append(numbers, big_numbers)

array([  1,   2,   3,   4, 100, 200, 300, 400])

In [19]:
np.append(numbers, big_numbers, 1)

array([[  1,   2, 100, 200],
       [  3,   4, 300, 400]])

In [20]:
np.append(numbers, big_numbers, 0)

array([[  1,   2],
       [  3,   4],
       [100, 200],
       [300, 400]])

In [21]:
numbers + big_numbers

array([[101, 202],
       [303, 404]])

In [22]:
numbers * big_numbers

array([[ 100,  400],
       [ 900, 1600]])

In [23]:
np.exp(numbers)

array([[  2.71828183,   7.3890561 ],
       [ 20.08553692,  54.59815003]])

## ```Series.append```

In [24]:
import pandas as pd
s1 = pd.Series(['a','b'])
s2 = pd.Series([1,2])

In [25]:
s1.append(s2)

0    a
1    b
0    1
1    2
dtype: object

In [26]:
s1.append(s2, ignore_index=True)

0    a
1    b
2    1
3    2
dtype: object

In [27]:
2 * s2.append(s1)

0     2
1     4
0    aa
1    bb
dtype: object

In [28]:
s3 = pd.Series([4, 5])
s2 + s3

0    5
1    7
dtype: int64

## `DataFrame.append` and `.join`

In [29]:
list_of_lists = [[24, 21, 23], 
                 [12, 18,  6], 
                 [6, 18, 12]]
df = pd.DataFrame(list_of_lists)
df

Unnamed: 0,0,1,2
0,24,21,23
1,12,18,6
2,6,18,12


In [30]:
df = df.append(df)
df

Unnamed: 0,0,1,2
0,24,21,23
1,12,18,6
2,6,18,12
0,24,21,23
1,12,18,6
2,6,18,12


In [31]:
df['State'] = 'CA','NM','AZ', "WI", "TX", "NY"
df

Unnamed: 0,0,1,2,State
0,24,21,23,CA
1,12,18,6,NM
2,6,18,12,AZ
0,24,21,23,WI
1,12,18,6,TX
2,6,18,12,NY


```
In [ ]: ?df.join

Signature: df.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
Docstring:
Join columns with other DataFrame either on index or on a key
column. Efficiently Join multiple DataFrame objects by index at once by
passing a list.

Parameters
----------
other : DataFrame, Series with name field set, or list of DataFrame
    Index should be similar to one of the columns in this one. If a
    Series is passed, its name attribute must be set, and that will be
    used as the column name in the resulting joined DataFrame
on : column name, tuple/list of column names, or array-like
    Column(s) in the caller to join on the index in other,
    otherwise joins index-on-index. If multiples
    columns given, the passed DataFrame must have a MultiIndex. Can
    pass an array as the join key if not already contained in the
    calling DataFrame. Like an Excel VLOOKUP operation
how : {'left', 'right', 'outer', 'inner'}, default: 'left'
    How to handle the operation of the two objects.

    * left: use calling frame's index (or column if on is specified)
    * right: use other frame's index
    * outer: form union of calling frame's index (or column if on is
      specified) with other frame's index, and sort it
      lexicographically
    * inner: form intersection of calling frame's index (or column if
      on is specified) with other frame's index, preserving the order
      of the calling's one
lsuffix : string
    Suffix to use from left frame's overlapping columns
rsuffix : string
    Suffix to use from right frame's overlapping columns
sort : boolean, default False
    Order result DataFrame lexicographically by the join key. If False,
    the order of the join key depends on the join type (how keyword)
```

In [32]:
state_numbers = pd.DataFrame([1, 2, 0, 5, 4, 3], index = ["CA","NM","AZ", "WI", "TX", "NY"])
state_numbers                                                           

Unnamed: 0,0
CA,1
NM,2
AZ,0
WI,5
TX,4
NY,3


In [33]:
df = df.join(state_numbers, on="State", how="outer", lsuffix='-l')
df

Unnamed: 0,0-l,1,2,State,0
0,24,21,23,CA,1
1,12,18,6,NM,2
2,6,18,12,AZ,0
0,24,21,23,WI,5
1,12,18,6,TX,4
2,6,18,12,NY,3


In [34]:
df[3]=['look', 'at', 'that'] * 2
df

Unnamed: 0,0-l,1,2,State,0,3
0,24,21,23,CA,1,look
1,12,18,6,NM,2,at
2,6,18,12,AZ,0,that
0,24,21,23,WI,5,look
1,12,18,6,TX,4,at
2,6,18,12,NY,3,that


# Exercises

`1.` Here are some beverages:  "Water", "Tea", "Coffee".  And here are some more: "Milk", "Kefir", "Lemonade".

> a. Make two lists of 3 beverages each.

> b. Put your two lists into one big list in three different ways.

> c. Add "Beer" to your big list in 3 different ways.

> d. Get "Beer" into your `list` one more time, this time at the front of the list.

`2.`  Matching.  Some on the left have more than one answer on the right, and vice versa:

|with this data type|you use this|facility to grow or sort|
    |-------------:|:---:|:----|
    |list| |1. sort_values|
    |    |  |2. sorted|
    |np.array |  |3. += |
    |          |  |4. sort|
    |pd.Series |  |5. append|
    |          |   |6. extend|
    |pd.DataFrame|  |7. concatenate|
    |    |    | 8. join |

`3.` Numpy has its own `random` sub-module.  Discover what's in it using `np.random.->` or `dir(np.random)`.  

>  a. Practice by making a 3 x 4 matrix of normally distributed random numbers.  

>  b. Study the result of `?np.random.random` and/or `?np.random.random_sample`.  They are the same.  Make a 3 x 5 matrix of random numbers, uniformly distributed between 1 and 100.

>  c. Paste your two matrices together so you have a 3 X 9 matrix using append.

>  d. Study the help for `np.concatenate`.  Use `concatenate` to get the same result.

>  e. Check your results by using `c_result.shape` and `d_result.shape`.  Also, check `c_result == d_result`.

`4`.  Put these data in a DataFrame:

```python
data = [["one", "two"],
        [1    , 2    ]]
```

> `a`.  Add a `"three", 3` column.

> `b`.  Add a row of the Spanish words:  uno, dos, tres; or use your own language.


> `c`. Rename the index to be "English", "Number", "Spanish"  and the columns to be 1, 2, 3.

> `d.`  With `df.reindex(new_index)` you can move the index to a different order and each row will follow its index.
>       Use `df.reindex` to make the order be ["Number", "English", "Spanish"].

In [None]:
# 5 a. Here are some data representing some state's population.  Get them into a data frame.

index= ['California', 'Florida', 'Pennsylvania', 'Ohio', 'North Carolina', 'New Jersey', 'Washington', 'Massachusetts', 'Indiana', 'Maryland', 'Colorado', 'South Carolina', 'Louisiana', 'Oregon', 'Connecticut', 'Utah', 'Nevada', 'Kansas', 'Nebraska', 'Idaho', 'New Hampshire', 'Rhode Island', 'Delaware', 'North Dakota', 'Vermont']
columns= ['Pop 2017', 'Rank 2017', 'Pop/Seat, 2017']
values = [['39,536,653', 1, '718,848'],
['20,984,400', 3, '734,904'],
['12,805,537', 5, '640,277'],
['11,658,609', 7, '647,701'],
['10,273,419', 9, '684,895'],
['9,005,644', 11, '643,260'],
['7,405,743', 13, '617,145'],
['6,859,819', 15, '623,620'],
['6,666,818', 17, '606,074'],
['6,052,177', 19, '605,218'],
['5,607,154', 21, '623,017'],
['5,024,369', 23, '558,263'],
['4,684,333', 25, '585,542'],
['4,142,776', 27, '591,825'],
['3,588,184', 29, '512,598'],
['3,101,833', 31, '516,972'],
['2,998,039', 33, '499,673'],
['2,913,123', 35, '485,521'],
['1,920,076', 37, '384,015'],
['1,716,943', 39, '429,236'],
['1,342,795', 41, '335,699'],
['1,059,639', 43, '264,910'],
['961,939', 45, '320,646'],
['755,393', 47, '251,798'],
['623,657', 49, '207,886']]

In [None]:
# 5.b And here are some more states and their populations. Get them into a dataframe.

data = [['State', 'Pop 2017', 'Rank 2017', 'Pop/Seat, 2017'],
['Texas', '28,304,596', 2, '744,858'],
['New York', '19,849,399', 4, '684,462'],
['Illinois', '12,802,023', 6, '640,101'],
['Georgia', '10,429,379', 8, '651,836'],
['Michigan', '9,962,311', 10, '622,644'],
['Virginia', '8,470,020', 12, '651,540'],
['Arizona', '7,016,270', 14, '637,843'],
['Tennessee', '6,715,984', 16, '610,544'],
['Missouri', '6,113,532', 18, '611,353'],
['Wisconsin', '5,795,483', 20, '579,548'],
['Minnesota', '5,576,606', 22, '557,661'],
['Alabama', '4,874,747', 24, '541,639'],
['Kentucky', '4,454,189', 26, '556,774'],
['Oklahoma', '3,930,864', 28, '561,552'],
['Iowa', '3,145,711', 30, '524,285'],
['Arkansas', '3,004,279', 32, '500,713'],
['Mississippi', '2,984,100', 34, '497,350'],
['New Mexico', '2,088,070', 36, '417,614'],
['West Virginia', '1,815,857', 38, '363,171'],
['Hawaii', '1,427,538', 40, '356,885'],
['Maine', '1,335,907', 42, '333,977'],
['Montana', '1,050,493', 44, '350,164'],
['South Dakota', '869,666', 46, '289,889'],
['Alaska', '739,795', 48, '246,598'],
['Wyoming', '579,315', 50, '193,105']]


In [None]:
# 5.c.  Finally, here are some data for each state's symbols.  Make them into another dataframe.

index= ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming', 'Washington, D.C.']
columns= ['Capital', 'Largest City', 'Bird', 'Flower']
values = [['Montgomery', 'Birmingham', 'Yellowhammer', 'Camellia'],
['Juneau', 'Anchorage', 'Willow Ptarmigan', 'Forget-me-not'],
['Phoenix', 'Phoenix', 'Cactus Wren', 'Saguaro Cactus Blossom'],
['Little Rock', 'Little Rock', 'Mockingbird', 'Apple Blossom'],
['Sacramento', 'Los Angeles', 'California Valley Quail', 'Golden Poppy'],
['Denver', 'Denver', 'Lark Bunting', 'Columbine'],
['Hartford', 'Bridgeport', 'American Robin', 'Mountain Laurel'],
['Dover', 'Wilmington', 'Blue Hen Chicken', 'Peach Blossom'],
['Tallahassee', 'Jacksonville', 'Mockingbird', 'Orange Blossom'],
['Atlanta', 'Atlanta', 'Brown Thrasher', 'Cherokee Rose'],
['Honolulu', 'Honolulu', 'Nene (Hawaiian Goose)', 'Hibiscus'],
['Boise', 'Boise', 'Mountain Bluebird', 'Syringa'],
['Springfield', 'Chicago', 'Cardinal', 'Native violet'],
['Indianapolis', 'Indianapolis', 'Cardinal', 'Peony'],
['Des Moines', 'Des Moines', 'Eastern Goldfinch', 'Wild Rose'],
['Topeka', 'Wichita', 'Western Meadowlark', 'Native Sunflower'],
['Frankfort', 'Louisville', 'Kentucky Cardinal', 'Goldenrod'],
['Baton Rouge', 'New Orleans', 'Pelican', 'Magnolia'],
['Augusta', 'Portland', 'Chickadee', 'White Pine Cone and Tassel'],
['Annapolis', 'Baltimore', 'Baltimore Oriole', 'Black-Eyed Susan'],
['Boston', 'Boston', 'Chickadee', 'Mayflower'],
['Lansing', 'Detroit', 'Robin', 'Apple Blossom'],
['St. Paul', 'Minneapolis', 'Common Loon', "Pink and White Lady's Slipper"],
['Jackson', 'Jackson', 'Mockingbird', 'Magnolia'],
['Jefferson City', 'Kansas City', 'Bluebird', 'Hawthorn'],
['Helena', 'Billings', 'Western Meadowlark', 'Bitterroot'],
['Lincoln', 'Omaha', 'Western Meadowlark', 'Goldenrod'],
['Carson City', 'Las Vegas', 'Mountain Bluebird', 'Sagebrush'],
['Concord', 'Manchester', 'Purple Finch', 'Purple Lilac'],
['Trenton', 'Newark', 'Eastern Goldfinch', 'Purple Violet'],
['Santa Fe', 'Albuquerque', 'Roadrunner', 'Yucca Flower'],
['Albany', 'New York', 'Bluebird', 'Rose'],
['Raleigh', 'Charlotte', 'Cardinal', 'Dogwood'],
['Bismarck', 'Fargo', 'Western Meadowlark', 'Wild Prairie Rose'],
['Columbus', 'Columbus', 'Cardinal', 'Scarlet Carnation'],
['Oklahoma City', 'Oklahoma City', 'Scissor-Tailed Flycatcher', 'Mistletoe'],
['Salem', 'Portland', 'Western Meadowlark', 'Oregon Grape'],
['Harrisburg', 'Philadelphia', 'Ruffed Grouse', 'Mountain Laurel'],
['Providence', 'Providence', 'Rhode Island Red', 'Violet'],
['Columbia', 'Columbia', 'Carolina Wren', 'Yellow Jessamine'],
['Pierre', 'Sioux Falls', 'Ring-Necked Pheasant', 'American Pasqueflower'],
['Nashville', 'Memphis', 'Mockingbird', 'Iris'],
['Austin', 'Houston', 'Mockingbird', 'Bluebonnet'],
['Salt Lake City', 'Salt Lake City', 'California Gull', 'Sego Lily'],
['Montpelier', 'Burlington', 'Hermit Thrush', 'Red Clover'],
['Richmond', 'Virginia Beach', 'Cardinal', 'Dogwood'],
['Olympia', 'Seattle', 'Willow Goldfinch', 'Western Rhododendron'],
['Charleston', 'Charleston', 'Cardinal', 'Big Rhododendron'],
['Madison', 'Milwaukee', 'Robin', 'Wood Violet'],
['Cheyenne', 'Cheyenne', 'Meadowlark', 'Indian Paintbrush'],
['None', 'Washington', 'Woodthrush', 'American Beauty Rose']]


> `5. d.` Get all that data into one DataFrame, joined on the state.

` 5. e.` Note that Washington DC had no information about population, nor does it have a state capital. NaN (Not A Number) is placed in those fields by default, which is available to you as `np.nan`.  

Optional now because we will have a lab about this:  Check out `?pd.DataFrame.fillna` to see how to change them from NaN to "missing".  The first argument mentioned is `self`, which is an identifier for the DataFrame that is calling the method.  Python fills that in so you can ignore it.

`df.head()` shows you the first 5 rows of the dataframe.
`df.tail()` shows you the last 5 rows of the dataframe.