# Class 3: Data Structures

### 3.1 Review

---

* Retrieving Data Files Manually
* Reading in ASCII/Binary
* Use of string manipulation

### 3.2 Automation (somewhat)

---

In this section, we will revisit the problem we looked at during class to try to show how one might programatically automate this routine as well as learn more advanced ways of reading and manipulating data.

### 3.2.1. Imports

---

In programming, Python's true benefit is from the importation of the many community-developed packages/libraries that enable one to perform tasks easily and with less effort.

In [None]:
import this # import statement with the package/module named 'this'

This import statement we have seen before and I will reference this Python Enhancement Proposal to aid you in the proper way of importing:

__PEP8__

The Python Style Guide tells you the best way to perform imports, name functions, and overall coding advice. It used to be called PEP8 (Python Enhancement Proposal 8), but in 2016, it was renamed to pycodestyle.

* [PEP8](http://www.python.org/dev/peps/pep-0008/)
* [pycodestyle](http://github.com/PyCQA/pycodestyle)
* [pep8](http://pep8.readthedocs.io/en/release-1.7.x/) - a Python package that checks your code for you.
* [pep8.org](http://pep8.org) - a more human friendly approach

For our automation, we need to import 2 packages to the Python interpreter environment. That is, we are now expanding what the interpreter recognizes as keywords or within its namespace.

In [None]:
import urllib.request
import json

You can import multiple packages on the same coding line or separtely (as seen above). I usually chose the latter as it allows you to group them in a logical manner. Here, we have imported the sub-module 'request' in the urllib package which allows us to handle HTTP urls and send/retrieve requests from it. Secondly, we import the JSON package that aids in our particular data type that we are working with from the SWPC website.

_Note:_ You do not have to re-import packages once they are in your environment, but you do import them every time you run this script otherwise you will get a namespace error as 'json' or 'urllib.request' are not variable names in a new Python session.

> Python imports are like libraries/utilities that others have written for you to use. You can import packages (set of scripts) or modules (single scripts).

Basic Example:

```python
import numpy # a fantastic numerical Python package
print(numpy)
```
    
1. Imports can be renamed:

  ```python
  import numpy as np
  print(np)
  ```

2. You can import submodules directly:

  ```python
  from numpy import ma
  print(ma)
  ```

3. You can import specific parts of modules as well (i.e., an object, function, or class within the module):

  ```python
  from numpy.ma import masked_array
  print(masked_array)
  ```

In [None]:
url = 'http://services.swpc.noaa.gov/products/solar-wind/plasma-2-hour.json'
with urllib.request.urlopen(url) as f:
    data = json.loads(f.read().decode())

Here, I had to add the decode method in order to decode the string into ascii so that the json loads method can operate upon the string.

### 3.3 Data Structures

---

We will only look at a single data structure at this time, but there are more that are adventagous that will we learn in the following lectures. All the features will not be presented, but you can keep this handy link as a reference:

[Official Documentation - Data Structures](http://docs.python.org/3/tutorial/datastructures.html)

### 3.3.1 Lists

---

A list is a mutable structure (i.e., changeable or allowed to be modified) that can be simply referred to as a grouping of items into a single collection. To aid in thinking about what these are like, I like to picture a grocery list. Items can be removed, moved, added and you can have varying types of items and quantities as well.

In [None]:
my_list = []
my_grocery_list = ['eggs', 2, ['a', nested, [list('string')]]]

Lists, like most data/variables we see in Python does not have to be initialized nor have memory allocated for them ahead of time. In our example of the JSON for space weather data, the variable 'data' now is a type list due to the brackets we see when we print out this variable's contents.

### 3.3.2 Slicing

---

Just like strings, we can slice lists in order to select sections or create sub-lists from our larger grouping of data.

```
 +---+---+---+---+---+---+  
 | P | y | t | h | o | n |  
 +---+---+---+---+---+---+  
 0   1   2   3   4   5   6  
-6  -5  -4  -3  -2  -1```

> In Python, indexes start at zero, not one.

In [None]:
name = 'Brent Smith'
last = name[5:]
first = name[:5]
first = name[:-5]
skip = name[::2]
substring = name[3:8]

In [None]:
print(substring)

### 3.4 Loops

---

There are two types of loops that we will encounter in Python:
* for
* while

### 3.4.1 The `for` Loop

---

When we want to iterate through a portion of the sequence (item-by-item), we choose `for` loops. These are very powerful for programmers to perform repetitative operations.

In [None]:
for item in iterable:
    # perform some type of operation
    # item is now a variable within this context
# if you used item here, it would refer to the last element in the iterable

In [None]:
days_in_year = 365

for day in range(days_in_year):
    print('Day {today}'.format(today=day))

To explain, the for loop assigns the variable `day` to each item within the list of names as it proceeds to iterate through every element of the list. Following the for loop line, following statements indented (by either 2 spaces, 4 spaces, or a tab) contextually. That means, that you can then proceed to use the variable `day`'s current value (which is the current item in the list) to do any sorts of operations upon it.

If the list is empty, there is nothing to iterate upon so the statements that follow and indented are ignored.

### 3.4.2 The `while` Loop

---

When you don't know the number of items or want to go forever until a condition is met, use the `while` loop.

__WARNING:__ Be careful not to create an endless loop!

In [None]:
while condition:
    # perform some type of operation

![flowchart](http://imgs.xkcd.com/comics/flowchart.png)

In [None]:
age = 30

while age < 50:
    print('I am {current_age} years old.'.format(current_age=age))
    age += 1

### 3.5 List Comprehensions

---

A more compact way to iterate through lists.

In [None]:
result = [x for x in range(0, 20, 2)]
print(result)

In [None]:
students = [('Brent', 'Smith'), ('Bob', 'LastName'), ('FirstName', 'Blah')]
full_names = [first + ' ' + last for first,last in students]
print(full_names)

### 3.6 The In-Class Project (from last time)

---

Last time, I had you download a file, read the contents, manipulate the contents, and then compute the average. The code below shows how compactly you can do all of this without leaving the Python interpreter. The next lecture we will see about how to analyze and visualize data.

In [None]:
import urllib.request, json
with urllib.request.urlopen('http://services.swpc.noaa.gov/products/solar-wind/plasma-2-hour.json') as url:
    data = json.loads(url.read().decode())
data = [float(entry[3]) for entry in data[1:]]
data = sum(data)/len(data)
data