# Series and views
* There are two hidden but powerful parts of Pandas `DataFrame`s
* Series is the type of one column from a `DataFrame`
   * enables column operations 
   * acts like a `numpy` `ndarray`. 
* Views are subsets of the original `DataFrame` where editing them changes the original. 
   * a new syntax creates views. 
   * This is the easiest way to edit a `DataFrame`

# The hidden type: Series

When we act on columns in a `DataFrame`, they are actually of type `Series`. 
* `Series` acts a lot like an `ndarray`.
* most `ndarray` functions supported. 
* default index is integer offset. 

But `Series` is -- in fact -- much more expressive than `ndarray`
* Can index by non-numeric data, i.e., one can "label" rows. 
* Can optimize operations by careful choices in indexing. 

Consider:

In [None]:
import pandas as pd
d1 = pd.DataFrame({ 'a': [1,2,3], 'b': [4,5,6], 'c': [7,8,9]})
d1

In [None]:
d1['a']  # one column

In [None]:
type(d1['a'])  # it's a Series

In [None]:
d1['a'][1]  # [column][row]

In [None]:
d1['a'].sum()  # all rows 

In [None]:
d1['b'].mean()  # all rows 

# A few caveats
1. A series via the syntax `df[column]` is a copy. Changing it doesn't change the original. If you try it, you'll get an interesting warning:

In [None]:
d1['b'][1] = 20

This didn't change `d1` at all: 

In [None]:
d1

# Indexes
An index is a set of labels for rows. The default index is 0-n integers. Indexes can be anything. Let's use letters. 

In [None]:
d1['labels'] = ['d', 'e', 'f']
d1

In [None]:
d2 = d1.set_index('labels')
d2

In [None]:
d2['a']

# Whoa there! What just happened?
* Labeling a `DataFrame` usually creates a new `DataFrame`.
* Series also support row labels. 
* Changing the labels on a `DataFrame` changes the labels on all Series. 

We can access by column and row, as before: 

In [None]:
d2['a']['e']

but the following less intuitive syntax is recommended for performance reasons. 
* `:'e'` a *row range:* labels up to and including 'e'
* `'b':`  a *column range:* labels from 'b' upward. 
* `:` by itself denotes all.

In [None]:
d2.loc[:'e','b':]  # create a view of d2

# Not particularly intuitive, but very powerful. 
* The addressing form `.loc[]` above has significant powers. 
* Consider

In [None]:
d2.loc[:'e', 'b':] = 42
d2

The assignment set multiple cells to a value. 
This is a special case of a more general property. 

# Copies and views

In dealing with Pandas, there are two kinds of derived data: 
* *Copies* are decoupled from the original data. 
* *Views* retain their coupling with the original data. 

The meaning of the word *view* is consistent with its use in databases. 

The key issue is again *mutability*. 
* Changing a view changes the original data. 
* Changing a copy does not. 

The curious notation `df.loc[rows, columns]` creates a *view*. 
* Not separate from the original `DataFrame`. 
* Changing it changes the original `DataFrame`! 

The more typical notation `df[columns][rows]` creates a *copy*. 
* The copy is independent of the original. 
* Changing it doesn't change the original data. 
* The first bracket does the copy. 
* This avoids confusion when using row expressions. 

Consider, e.g., 

In [None]:
v1 = d2.loc['e':,'b':]  # a view
v1



In [None]:
v1.loc['e','b']=100
v1

In [None]:
d2

# Whoa there! What happened?
The view `v1` was an alias for a subset of `d2`, and changing `v1` changed `d2`. 

# Views can be partial

In [None]:
v1['foo'] = True  # a new column, not part of the view
v1

In [None]:
v1.loc['e', 'c'] = 200
v1


In [None]:
d2

# Copies are decoupled
Consider: 

In [None]:
c1 = d2[['b', 'c']][:'e']  # copied 
c1

In [None]:
c1.loc['e', 'b'] = 300  # technically a view of a copy(!)
c1  # gets changed

In [None]:
d2  # doesn't reflect change of copy. 

# Why is this so weird? 
* Pandas is an evolving language. 
* The copy syntax (e.g., df[columns][rows]) evolved first, to enable column operations. 
* The view syntax (e.g. df.loc[columns, rows]) evolved last, to enable setting cells easily (and for efficiency). 
* People were already using the copy syntax widely, and Pandas couldn't change that without breaking users' code. 
* So Pandas instituted a new, separate syntax for the different use case. 

# Labels on series
* Series can be labeled as well. 
* They inherit their labels from the `DataFrame`. 
* All series have exactly the same row labels for each row. 
* Some of the `Series` queries look like `DataFrame` queries. 

Consider

In [None]:
s1 = d1['b']
s1

# Let's put this into practice
First, let's register you for grading. 

In [None]:
# Don't change this cell; just run it. 
from client.api.notebook import Notebook
ok = Notebook('03-06-dataframe-views.ok')
ok.auth(inline=True)

Let's make up a test `DataFrame`: 

In [None]:
df = pd.DataFrame({
    'name': ['Garfield', 'Bill', 'Snoopy', 'Dogbert'],
    'kind': ['cat', 'cat', 'dog', 'dog'],
    'weight': [20, 10, 15, 10],
    'food': ['lasagna', 'roadkill', 'canned', 'pate']
})
df

1. Create a new `DataFrame` `pets` from `df` that is indexed by name. 

In [None]:
# your answer: 
pets = ...
pets

In [None]:
_ = ok.grade('q01')  # run this to check your work. 

2. In `pets`, set 'Snoopy's weight to 16. 

In [None]:
# Your answer:
...
pets

In [None]:
_ = ok.grade('q02')  # run this to check your work. 

3. Create a copy `dogs` that consists of just the dogs in `pets`.

In [None]:
# Your answer: 
dogs = ...
dogs

In [None]:
_ = ok.grade('q03')  # run this to check your work. 

4. In `dogs`, set Dogbert's weight to 25. This will print a warning. 

In [None]:
# Your answer: 
...
dogs

In [None]:
_ = ok.grade('q04')  # run this to check your work. 

In [None]:
pets  # what happened to the original?

5. Create a Series `weights` of `dogs` with just the weights.

In [None]:
weights = ...
weights

In [None]:
_ = ok.grade('q05')  # run this to check your work. 

6. Change Dogbert's weight to 35 in the copy `weights`. This will print a warning. 

In [None]:
# Your answer: 
...
weights

In [None]:
_ = ok.grade('q06')  # run this to check your work. 

In [None]:
dogs  # Did you change the copy? 

(Ed.s note: This is amusing. It both warns me that it won't change the original and then changes it. If the type of this object were `DataFrame`, the warning would be reasonable, but the `weights` object is type `Series`, so the warning is moot.)

In [None]:
pets  # check that you didn't change the top-level original

7. **Challenge problem:** (optional) Create a version of `dogs` that is a *view* and demonstrate that it is a view by making a change in the view that is reflected in `pets`. I have been unable to do this! I wonder if it's possible!

In [None]:
# Your answer: 
dogs = ...
dogs

# When you are done with this notebook, 

* Save and checkpoint. 
* Ensure that the name of this file is precisely `03-06-dataframe-views.ipynb`. 
* <del>Change `ready` to `True` in the cell below. </del>
* <del>Run the cell below to submit your work for grading. </del>
* Save and checkpoint the notebook. 

* If your Jupyter installation can download the notebook as a PDF,
    * (File >> Download as >> PDF via LaTeX (.pdf)), 
    * Rename the downloaded file to `<loginid>-03-06-dataframe-views.pdf`. In other words, my filename would be `jsingh11-03-06-dataframe-views.pdf`.
    * Submit the file `<loginid>-03-06-dataframe-views.pdf` to Canvas.
* Otherwise 
    * (File >> Download as >> Notebook (.ipynb)). In other words, my filename would be `jsingh11-03-06-dataframe-views.ipynb`.
    * Rename the downloaded file to `<loginid>-03-06-dataframe-views.ipynb`,
    * Submit the file `<loginid>-03-06-dataframe-views.ipynb` to Canvas.