# Class 2: Data in Python (Applications)

![logo](http://www.nesdis.noaa.gov/sites/default/files/assets/images/discovr_first_year.jpg)

### Semester-Long Project (Homework Results)

---

In the homework from last week, you were to do some research about the DSCOVR mission and the data it retrieves. Within this lecture, you will learn how to read data into Python, but let's first review the homework so everyone is on the same page.

1. There are 3 instruments on board DSCOVR:

   1. Enhanced Polychromatic Imaging Camera (EPIC)  
   ![epic](http://www.nesdis.noaa.gov/sites/default/files/assets/images/epic_big.jpg)
   
   2. National Institute of Standard and Technology Advanced Radiometer (NISTAR)  
   ![nistar](http://www.nesdis.noaa.gov/sites/default/files/assets/images/nistar_big.jpg)
   
   3. Plasma-Magnetometer (PlasMag)  

2. The orbit of DSCOVR is the Lagrange 1 (L1) orbit.
![orbit](http://www.nasa.gov/sites/default/files/image1-dscovrl1-orbit.jpg)
![orbit](http://www.nesdis.noaa.gov/sites/default/files/assets/images/point_of_lagrange1_big.jpg)

We will use this lecture to download data from DSCOVR and open it within Python.

### 2.1.1 We've Covered Thus Far

---

* Numberical Data Types
* Comments
* A little bit about Strings

### 2.1.2 Data Retrieval

---

The first obstacle we encounter in this semester-long project is how we can obtain data from the DSCOVR satellite. Because it covers two distinct fields of which have their own data formats, we will look at both eventually. First, let's look at a data format we are familiar with: ascii. The space science portion of the instrument has freely available data at different frequencies of which we will start small: the most recent 2 hour data for plasma.

__Objective:__ Navigate to the following url and download the ASCII (JSON) file manually for the most recent 2-hour plasma data.

[http://services.swpc.noaa.gov/products/solar-wind/plasma-2-hour.json](http://services.swpc.noaa.gov/products/solar-wind/plasma-2-hour.json)

Other sources of data may come in the format of imagery, binary files, or various other formats. As a scientific programmer, you must discover or learn what tools are needed for each relevant data format. Here, we know that ASCII is just text, so we already have (some) tools to manipulate this data by using string manipulation.

### 2.1.3 Simple File I/O

---

Python comes with multiple libraries to read ascii files that perhaps contain comma separated values, or many vast collections of numbers. Here, we will see the most basic way to open a file and read in the contents.

In [None]:
f = open('filename.txt', 'r')
data = f.read()
f.close()

The above code is just a placeholder for you to read in your plasma data from the JSON/ASCII file you just retrieved above. Let's perform this now.

In [13]:
f = open('/Users/ebsmith2/Desktop/plasma-2-hour.json', 'r')
data = f.read()
f.close()
print(type(data))

<class 'str'>


In the code section above, we have a simple opening of a file stream, reading from that file stream, and then closing the stream. The reason I say it is a stream is that once a file is opened, it is available for various operations upon the file until it is finally closed.

__Important:__ Don't forget to close your files!

Python, actually has a helpful construct to aid in this file I/O procedure. Using the `with` statement, Python will automatically close the file stream upon the close of the contextual indentation.

In [None]:
filename = '/Users/ebsmith2/Desktop/plasma-2-hour.json'
with open(filename) as f:
    data = f.read()

Notice the `open` statement in the first line of the `with` block. This sets the filestream to the variable `f` just like we did above, but without the equals sign. As mentioned above, this section of code will automatically close the file stream without us having to remember to do so.

Files can be opened using different modes such as reading, writing, and appending. You can read more about them in the official documentation here: [http://docs.python.org/3/library/functions.html#open](http://docs.python.org/3/library/functions.html#open)

# In-Class Exercise

Since now we can read data in from a file, we want to be able use this numerical data within Python for calculations. Through this series of steps, you will open a file, read the contents of that file, and then use string manipulation and data type casting to obtain the average plasma temperature for the week.

I haven't taught you everything to do the following, but the official Python documentation, your neighbors/classmates, and Google is your friend.

1. Manually download the 7-day plasma JSON file from here: [http://services.swpc.noaa.gov/products/solar-wind/](http://services.swpc.noaa.gov/products/solar-wind/).
2. Open the file within Python and read the data contents.
3. Using string manipulation, split the data into rows of data (i.e., each data entry is one row).
4. Again using string manipulation (_hint:_ string slicing) obtain the data only for the temperature column.
5. Using data type casting, cast each of these values as floating point numbers.
6. Calculate the average of all of these values (_hint:_ for loops).

In our next class, I will cover how one might automate these types of tasks.