# Introduction to Visualization In Action - Part 1

Purpose: This notebook will introduce you to various plotting "packages" currently supported in the Python ecosystem and to functionalities you normally run into when dealing with space science datasets.


## A Quick Review


### What is Python and why are we using it?

Python is an object oriented programming language. This means it has 'objects' which have certain rules or methods which you can access to run computer programming. We've seen this as we use pandas dataframes (objects) to analyze data. 


### What is Jupyter?

Jupyter is a interactive environment (this "notebook") where we can explore how a programming language ie Python works. You can "run" various cells at a time by hitting shift-enter OR by hitting run after selecting a cell. 

To edit a cell: double click. 
To make a text box cell: use the Cell -> Markdown option

## What will we do in this notebook: 

In this notebook we will be using a solar dataset (related to the sun) to understand long duration time behavior of the sun and it's impact on the Earth. Let's begin!

-

-

-

-

-

## Section 1. The Physics 

### We will be using solar data to visualize both the sun, and space weather.

*What is space weather?*
- "Space-weather events are naturally occurring phenomena that **have the potential to disrupt** electric power systems; satellite, aircraft, and spacecraft operations; telecommunications; position, navigation, and timing services; and other technologies and infrastructures..." Source - [National Space Weather Action Plan](https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/final_nationalspaceweatheractionplan_20151028.pdf)

- "Space weather refers to the **environmental conditions in Earth's magnetosphere (e.g. magnetic environment), ionosphere and thermosphere (e.g. upper atmosphere) due to the Sun and the solar wind** that can influence the functioning and reliability of spaceborne and ground-based systems and services or endanger property or human health." Source - [European Space Agency](http://swe.ssa.esa.int/what-is-space-weather)

See [video here](./Images/Example_GeoSpaceWeather.mp4) on a geomagnetic storm. 

<img src="./Images/SpaceWeatherNOAA.jpg" alt="Drawing2" width="2000px"/>

Image Source: [NOAA](http://www.noaa.gov/explainers/space-weather-storms-from-sun)

## Section 2. Reading in Required Package and Exploring Our Data

#### What is Matplotlib?

Matplotlib is a package that enables visualization and graphics in Python. While we are using Jupyter to demonstrate this package, we will also be saving figures at publication ready formats so that you can see how to use this in your work. 

Matplotlib like most other Python packages has amazing documentation online you can check it out at the [documentation](https://matplotlib.org/) online.


#### Why are we using these? 

Matplotlib is the basic plotting package for Python. We will see a few more packages (for geo data and statistical visualization) later. 

In [None]:
import matplotlib.pyplot as plt  #for basic plotting

import pandas as pd              #for analyzing/reading data
import numpy  as np              #for analyzing/reading data


import matplotlib.image as mpimg #for reading image data

In [None]:
#Let's start by looking by exploring what the sun looks like

In [None]:

#this is a built in function for reading image data
quietSun  = mpimg.imread('./Data/SolarMinImages/20180105_022906_512_0304.jpg')
activeSun = mpimg.imread('./Data/SolarMaxImages/20130317_044532_512_0304.jpg')


In [None]:
#now let's quickly take a look at them

plt.imshow(quietSun) #this is a quick way to look at image data

In [None]:
plt.imshow(activeSun)

### Activity: 

These images are taken from the Solar Dynamics Observatory. The first one is from 2013 and the second from 2018. Why do you think these are different? I'm going to put you in a Zoom room for a short time to quickly disucss with your neighbor about what you think might be different.

-

-

-

-


## Section 3. Load More Data 

We now have a qualitative understanding that the sun has different behaivor through time. Let's look at more quantitative measures of this.

We are going to read in OMNI data, which is a composite dataset made of many different measurements we use to understand solar impacts on Earth's space environment. 

Feel free to go ahead and look in the ./Data/OMNI/ folder at one of the .lst files. This is a text file with various information that we want to use. 

But how will we read this in? 

Yesterday we were introduced to pd.read_csv(), today we are going to see what else we can do with this command.

In [None]:
#reading in OMNI data

#our definition of the column names 
colNames = ['YEAR', 'DOY', 'Hour', 'BX', 'BY', 'BZ', 'FlowPressure', 'Ey', 'Kp', 
            'SunspotNumber', 'Dst', 'f10.7_index']

#here we read the hour cadence of omni data, declare the seperation to be any combination of spaces, 
#the names of the columns to be the ones we defined above, and the dates to be the first three columns
hourOmni = pd.read_csv('./Data/OMNI/omni2_Hourly1980_2018.lst', sep = '\s+', names = colNames,
                          parse_dates = {'Datetime': colNames[0:3]}, keep_date_col = 'True')

#here we are redefining the index to be a datetime index
hourOmni.index = pd.to_datetime(hourOmni['Datetime'], infer_datetime_format = False, 
                                   format = '%Y %j %H')


So that was a hefty command. Let's look at what reading this WITHOUT this fancy command looks like.

In [None]:
hourOmniBasic = pd.read_csv('./Data/OMNI/omni2_Hourly1980_2018.lst') #read in without fancy cleaning

In [None]:
print(hourOmniBasic.head())

In [None]:
print(hourOmni.head())

One of the things we also changed was the index. Let's take a look at those.

In [None]:
print(hourOmniBasic.index)

In [None]:
print(hourOmni.index)

In [None]:
#if you ever want to know MORE about a function you can find more information through google 
#but also in built like this -

help(pd.read_csv)

### Discussion: Do you see the difference between the two versions of reading in this file?

-

-

-

-

## Section 4. Clean the data

Sometimes in real datasets we will have missing or replaced values. In the OMNI dataset they replace missing data with strings of 9. For example:

In [None]:
hourOmni.tail() #notice how I didn't write print here? This is specific to Jupyter notebooks
                #some commands will print into the notebook without you having to SPECIFICALLY 
                #tell it to print the output

When we analyze this data we DON'T want various values of 9 to be cluttering averages or plots. One way to handle this is by replacing these values with nan's or (not a number).

In [None]:
hourOmni.replace(to_replace = [99, 99.9, 99.99, 999, 999.9, 999.99], value = np.nan, inplace = True)

In [None]:
hourOmni #now let's see what this dataset looks like

### Discussion: What questions do you have on this?

-

-

-

-

## Section 5. Plotting

Now that our data is ready to look at let's get started on some more serious visualizations.

In [None]:
#first let's start with the most basic of plots
plt.plot(hourOmni.index, hourOmni.SunspotNumber)

Here we are looking at how many sunspots are on the sun over time. The more active pictures from Solar Dynamics Observatory we saw before tend to correlate to more sunspots on the sun. For example we were looking at 2013 and 2018, a time when the sun was very active, and a time when it was not.

This isn't a very engaging plot however, not to mention is missing a title. Let's spend some time cleaning this up.

In [None]:
#This wraps our plot above into a more generic format

fig = plt.figure() #make a figure template
fig.suptitle('Long Duration Solar Behavior', fontsize = 20) #add a title and specific what SIZE the font should be

#set up titles for axis
plt.xlabel('Year',               fontsize = 20)
plt.ylabel('Number of Sunspots', fontsize = 20)

plt.plot(hourOmni.index, hourOmni.SunspotNumber, color = 'grey', lw = 2.0) #make the color grey and linewidth 2

#add a grid
plt.grid(color='gray', linestyle='dashed')

#let's make the fontsize of the ticks the same
plt.tick_params(labelsize = 14)


This is one way of plotting in matplotlib. One of the very confusing parts of learning Python is that there are often many ways of doing exactly the same thing. Let me show you another way to generate the exact same plot but now with using "axes" and "subplots".

In [None]:
fig = plt.figure() #make a figure template

fig.suptitle('Long Duration Solar Behavior', fontsize = 20) #add a title and specific what SIZE the font should be
ax  = fig.add_subplot() #NEW - add a subplot called ax (for axes)
    
#set up titles for axis
ax.set_xlabel('Year',               fontsize = 20) #notice now we are using ax instead of plt?
ax.set_ylabel('Number of Sunspots', fontsize = 20)

ax.plot(hourOmni.index, hourOmni.SunspotNumber, color = 'grey', lw = 2.0) #make the color grey and linewidth 2

#add a grid
ax.grid(color='gray', linestyle='dashed')

#let's make the fontsize of the ticks the same
ax.tick_params(labelsize = 14)


### Discussion: Let's pause here, why do you think it might be advantageous to use a different way to plot something if it's going to result in the same end figure?

-

-

-

-

-

-

-

## Section 6. Plotting - Leaning Into Axes

Let's see what some other things using axes in this way enables.

In [None]:
fig = plt.figure() #make a figure template

fig.suptitle('Long Duration Solar Behavior', fontsize = 20) #add a title and specific what SIZE the font should be
ax1  = fig.add_subplot() 
ax2  = ax1.twinx() #here we are making a NEW axes that is shares the x axes
    
#set up titles for axis
ax1.set_xlabel('Year',               fontsize = 20) #Now we set titles of all the axes
ax1.set_ylabel('Number of Sunspots', fontsize = 20, color = 'grey')
ax2.set_ylabel('f10.7 Standard Flux Units (sfu)', fontsize = 20, color = '#983b59', labelpad = 20, 
               rotation = 270) #note we added color a labelpad, and rotated the axes


ax1.plot(hourOmni.index, hourOmni.SunspotNumber, color = 'grey',    lw = 2.0) #make the color grey and linewidth 2
ax2.scatter(hourOmni.index, hourOmni.SunspotNumber, color = '#983b59', 
         alpha = 0.3, s = 0.2) #make the color redish, a scatter plot, and slightly transparent (alpha)
                               #you can find more hex code colors (the #number format) at color-hex.com

#add a grid
ax1.grid(color='gray', linestyle='dashed')

#let's make the fontsize of the ticks the same
ax1.tick_params(labelsize = 14)
ax2.tick_params(labelsize = 14)


### Discussion: What questions do you have about this?

-

-

-

-

-


### Section 7. Saving Figures 

Now if we want to save it, say for a report to your mentor we need to consider a few things. 

First the resolution of the figure (how clear it will look) and its size. Let's try some different things here. 

In [None]:
fig = plt.figure(figsize = (10, 7)) #make a figure template, note we now have a size

fig.suptitle('Long Duration Solar Behavior', fontsize = 20) #add a title and specific what SIZE the font should be
ax1  = fig.add_subplot() 
ax2  = ax1.twinx() #here we are making a NEW axes that is shares the x axes
    
#set up titles for axis
ax1.set_xlabel('Year',               fontsize = 20) #Now we set titles of all the axes
ax1.set_ylabel('Number of Sunspots', fontsize = 20, color = 'grey')
ax2.set_ylabel('f10.7 Standard Flux Units (sfu)', fontsize = 20, color = '#983b59', labelpad = 20, 
               rotation = 270) #note we added color a labelpad, and rotated the axes


ax1.plot(hourOmni.index, hourOmni.SunspotNumber, color = 'grey',    lw = 2.0) #make the color grey and linewidth 2
ax2.scatter(hourOmni.index, hourOmni.SunspotNumber, color = '#983b59', 
         alpha = 0.3, s = 0.2) #make the color redish, a scatter plot, and slightly transparent (alpha)
                               #you can find more hex code colors (the #number format) at color-hex.com

#add a grid
ax1.grid(color='gray', linestyle='dashed')

#let's make the fontsize of the ticks the same
ax1.tick_params(labelsize = 14)
ax2.tick_params(labelsize = 14)

Let's now see some saving options. 

In [None]:
fig.savefig('ExampleFig_Basic.png') #generic save as an image

fig.savefig('ExampleFig_HighRes.png', dpi = 300) #save as a high resolution image (dpi = 300 is usually a good lower limit)

fig.savefig('ExampleFig_Vector.pdf') #save as a vectorized (i.e. resolution is meaningless) pdf

### Discussion: Let's take a look at the figures we just saved (download and open, or open by double clicking) and discuss some trade offs.

-

-

-

-

-


Next we will move to an activity in IntroVizzies_2.ipynb