# Introduction to Data Visualization in Python

* [Multiple Plots on single axis](#multiple) 
    * [Using subplot()](#subplot)
* [Plotting 2D arrays](#2D)
* [Visualization Mutivariate Data](#Visualization)
* [Working with Images](#image)  
    * [Pseudocolor plot from image data](#image2)
    * [Rescaling pixel intensities](#image3)

### Multiple Plots on single axis  
The data set here comes from records of undergraduate degrees awarded to women in a variety of fields from 1970 to 2011. You can compare trends in degrees most easily by viewing two curves on the same set of axes

In [258]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib auto

Using matplotlib backend: MacOSX


In [259]:
df = pd.read_csv("percent-bachelors-degrees-women-usa.csv")

In [260]:
df.head()

Unnamed: 0,Year,Agriculture,Architecture,Art and Performance,Biology,Business,Communications and Journalism,Computer Science,Education,Engineering,English,Foreign Languages,Health Professions,Math and Statistics,Physical Sciences,Psychology,Public Administration,Social Sciences and History
0,1970,4.229798,11.921005,59.7,29.088363,9.064439,35.3,13.6,74.535328,0.8,65.570923,73.8,77.1,38.0,13.8,44.4,68.4,36.8
1,1971,5.452797,12.003106,59.9,29.394403,9.503187,35.5,13.6,74.149204,1.0,64.556485,73.9,75.5,39.0,14.9,46.2,65.5,36.2
2,1972,7.42071,13.214594,60.4,29.810221,10.558962,36.6,14.9,73.55452,1.2,63.664263,74.6,76.9,40.2,14.8,47.6,62.6,36.1
3,1973,9.653602,14.791613,60.2,31.147915,12.804602,38.4,16.4,73.501814,1.6,62.941502,74.9,77.4,40.9,16.5,50.4,64.3,36.4
4,1974,14.074623,17.444688,61.9,32.996183,16.20485,40.5,18.9,73.336811,2.2,62.413412,75.3,77.9,41.8,18.2,52.6,66.1,37.3


In [261]:
year = df.Year
physical_sciences = df[['Physical Sciences']]
computer_science = df[['Computer Science']]

In [262]:
# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year,physical_sciences , color='blue')
plt.plot(year, computer_science, color = 'red')

[<matplotlib.lines.Line2D at 0x125506940>]

It looks like, for the last 25 years or so, more women have been awarded undergraduate degrees in the Physical Sciences than in Computer Science.

<a id="cell2"></a>


### Using axes()
Rather than overlaying line plots on common axes, you may prefer to plot different line plots on distinct axes.   
In calling **plt.axes([xlo, ylo, width, height])**, a set of axes is created and made active with lower corner at coordinates (xlo, ylo) of the specified width and height. Note that these coordinates can be passed to plt.axes() in the form of a list or a tuple.

In [263]:
# Create plot axes for the first line plot
plt.axes([0.05, 0.05, 0.425, 0.9])

# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')

# Create plot axes for the second line plot
plt.axes([0.525, 0.05, 0.425, 0.9])

# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')


# Display the plot
plt.show()


<a id = 'subplot'> <a>
### Using subplot() (1)  
The command plt.axes() requires a lot of effort to use well because the coordinates of the axes need to be set manually. A better alternative is to use plt.subplot() to determine the layout automatically.  

plt.subplot(m, n, k) to make the subplot grid of dimensions m by n and to make the kth subplot active (subplots are numbered starting from 1 row-wise from the top left corner of the subplot grid).

In [264]:
# Create a figure with 1x2 subplot and make the left subplot active
plt.subplot(1, 2, 1)

# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')
plt.title('Physical Sciences')



Text(0.5,1,'Physical Sciences')

In [265]:
# Make the right subplot active in the current 1x2 subplot grid
plt.subplot(1, 2, 2)


# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')
plt.title('Computer Science')

# Use plt.tight_layout() to improve the spacing between subplots
plt.tight_layout() 
plt.show()



### Using subplot() (2)  
Here, you will make a 2×2 grid of subplots and plot the percentage of degrees awarded to women in Physical Sciences (using physical_sciences), in Computer Science (using computer_science), in Health Professions (using health), and in Education (using education).

In [266]:
# Create a figure with 2x2 subplot layout and make the top left subplot active
plt.subplot(2,2,1)

# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')
plt.title('Physical Sciences')



Text(0.5,1,'Physical Sciences')

In [267]:

# Make the top right subplot active in the current 2x2 subplot grid 
plt.subplot(2, 2, 2)

# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')
plt.title('Computer Science')



Text(0.5,1,'Computer Science')

In [268]:
# Make the bottom left subplot active in the current 2x2 subplot grid
plt.subplot(2,2,3)

# Plot in green the % of degrees awarded to women in Health Professions
plt.plot(year, df[['Health Professions']], color='green')
plt.title('Health Professions')



Text(0.5,1,'Health Professions')

In [269]:


# Make the bottom right subplot active in the current 2x2 subplot grid
plt.subplot(2,2,4)

# Plot in yellow the % of degrees awarded to women in Education
plt.plot(year, df[['Education']], color='yellow')
plt.title('Education')

# Improve the spacing between subplots and display them
plt.tight_layout()
plt.show()



### Using xlim and ylim

In [270]:
# Plot the % of degrees awarded to women in Computer Science and the Physical Sciences
plt.plot(year,computer_science, color='red') 
plt.plot(year, physical_sciences, color='blue')

# Add the axis labels
plt.xlabel('Year')
plt.ylabel('Degrees awarded to women (%)')



Text(212.588,0.5,'Degrees awarded to women (%)')

In [271]:
# Set the x-axis range
plt.xlim(1990, 2010)

# Set the y-axis range
plt.ylim(0, 50)

# Add a title and display the plot
plt.title('Degrees awarded to women (1990-2010)\nComputer Science (red)\nPhysical Sciences (blue)')
plt.show()

# Save the image as 'xlim_and_ylim.png'
plt.savefig('xlim_and_ylim.png')



### Using axis
Using plt.xlim() and plt.ylim() are useful for setting the axis limits individually. In this exercise, you will see how you can pass a 4-tuple to plt.axis() to **set limits for both axes at once**. For example, plt.axis((1980,1990,0,75)) would set the extent of the x-axis to the period between 1980 and 1990, and would set the y-axis extent from 0 to 75% degrees award.

In [272]:
# Plot in blue the % of degrees awarded to women in Computer Science
plt.plot(year,computer_science, color='blue')

# Plot in red the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences,color='red')



[<matplotlib.lines.Line2D at 0x12458e6d8>]

In [273]:
# Set the x-axis and y-axis limits
plt.axis([1990, 2010, 0, 50])

# Show the figure
plt.show()

# Save the figure as 'axis_limits.png'
plt.savefig('axis_limits.png')



### Using Legend
Legends are useful for distinguishing between multiple datasets displayed on common axes. The relevant data are created using specific line colors or markers in various plot commands. Using the keyword argument **label** in the plotting function associates a string to use in a legend.

![Screen%20Shot%202019-03-04%20at%202.10.26%20PM.png](attachment:Screen%20Shot%202019-03-04%20at%202.10.26%20PM.png)



In [274]:
# Specify the label 'Computer Science'
plt.plot(year, computer_science, color='red', label='Computer Science') 

# Specify the label 'Physical Sciences' 
plt.plot(year, physical_sciences, color='blue', label='Physical Sciences')


# Add axis labels and title
plt.xlabel('Year')
plt.ylabel('Enrollment (%)')
plt.title('Undergraduate enrollment of women')



Text(0.5,1,'Undergraduate enrollment of women')

In [275]:
# Add a legend at the lower center
plt.legend(loc= 'lower center')



<matplotlib.legend.Legend at 0x11a82fc50>

### Using annotate()
It is often useful to annotate a simple plot to provide context. This makes the plot **more readable** and can **highlight specific aspects** of the data. Annotations like **text** and **arrows** can be used to emphasize specific observations.

To enable an arrow, set **arrowprops=dict(facecolor='black')**. The arrow will point to the location given by xy and the text will appear at the location given by xytext.

![Screen%20Shot%202019-03-04%20at%202.11.31%20PM.png](attachment:Screen%20Shot%202019-03-04%20at%202.11.31%20PM.png)

We will mark the **inflection point** when enrollment of women in Computer Science reached a peak and started declining using plt.annotate().    

In [276]:
# Plot with legend as before
plt.plot(year, computer_science, color='red', label='Computer Science') 
plt.plot(year, physical_sciences, color='blue', label='Physical Sciences')
plt.legend(loc='lower right')

<matplotlib.legend.Legend at 0x124c43f98>

In [277]:
# Compute the maximum enrollment of women in Computer Science: cs_max
cs_max = computer_science.max()

# Calculate the year in which there was maximum enrollment of women in Computer Science: yr_max
yr_max = year[computer_science.idxmax()]

# Add a black arrow annotation
plt.annotate('Maximum', xy= (yr_max, cs_max), xytext = (yr_max+5, cs_max+5), arrowprops = dict(facecolor= 'black'))

Text(1988,42.1,'Maximum')

### Modifying styles
Matplotlib comes with a number of different stylesheets to customize the overall look of different plots. To activate a particular stylesheet you can simply call **plt.style.use()** with the name of the style sheet you want. To list all the available style sheets you can execute: **print(plt.style.available)**.

In [278]:
# Import matplotlib.pyplot
import matplotlib.pyplot as plt

# Set the style to 'ggplot'
plt.style.use('ggplot')

# Create a figure with 2x2 subplot layout
plt.subplot(2, 2, 1) 

# Plot the enrollment % of women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')
plt.title('Physical Sciences')





Text(0.5,1,'Physical Sciences')

In [279]:
# Plot the enrollment % of women in Computer Science
plt.subplot(2, 2, 2)
plt.plot(year, computer_science, color='red')
plt.title('Computer Science')

# Add annotation
cs_max = computer_science.idxmax()
yr_max = year[computer_science.idxmax()]
plt.annotate('Maximum', xy=(yr_max, cs_max), xytext=(yr_max+5, cs_max+5), arrowprops=dict(facecolor='black'))

# Plot the enrollmment % of women in Health professions
plt.subplot(2, 2, 3)
plt.plot(year, df[['Health Professions']], color='green')
plt.title('Health Professions')

# Plot the enrollment % of women in Education
plt.subplot(2, 2, 4)
plt.plot(year, df[['Education']], color='yellow')
plt.title('Education')

# Improve spacing between subplots and display them
plt.tight_layout()
plt.show()



<a id= '2D'> <a>
## Plotting 2D arrays

### Generating Meshes

In [280]:
# Generate two 1-D arrays: u, v
u = np.linspace(-2, 2, 41)
v = np.linspace(-1, 1, 21)


In [281]:
# Generate 2-D arrays from u and v: X, Y
X,Y = np.meshgrid(u, v)
# Compute Z based on X and Y
Z = np.sin(3*np.sqrt(X**2 + Y**2)) 
print(X.shape, Y.shape)
Z.shape

(21, 41) (21, 41)


(21, 41)

In [282]:
# Display the resulting image with pcolor()
plt.pcolor(Z)
plt.show()

# Save the figure to 'sine_mesh.png'
plt.savefig('sine_mesh.png')

## Contour & filled contour plots
In this exercise, you will visualize a 2-D array repeatedly using both plt.contour() and plt.contourf(). You will use plt.subplot() to display several contour plots in a common figure, using the meshgrid X, Y as the axes. For example, plt.contour(X, Y, Z) generates a default contour map of the array Z.



<a id ='Visualization'> <a>
## Visualization Mutivariate Data


In [283]:
df = pd.read_csv('auto-mpg.csv')

In [284]:
df.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name,color,size,marker
0,18.0,6,250.0,88,3139,14.5,71,US,ford mustang,red,27.370336,o
1,9.0,8,304.0,193,4732,18.5,70,US,hi 1200d,green,62.199511,o
2,36.1,4,91.0,60,1800,16.4,78,Asia,honda civic cvcc,blue,9.0,x
3,18.5,6,250.0,98,3525,19.0,77,US,ford granada,red,34.515625,o
4,34.3,4,97.0,78,2188,15.8,80,Europe,audi 4000,blue,13.298178,s


### Using hist2d()

In [285]:
plt.hist2d(df.hp, df.mpg)
plt.colorbar()

# Add labels, title, and display the plot
plt.xlabel('Horse power [hp]')
plt.ylabel('Miles per gallon [mpg]')
plt.title('hist2d() plot')
plt.show()



### Using hexbin()

In [286]:
plt.hexbin(df.hp, df.mpg)
plt.colorbar()

# Add labels, title, and display the plot
plt.xlabel('Horse power [hp]')
plt.ylabel('Miles per gallon [mpg]')
plt.title('hist2d() plot')
plt.show()




<a id ='image'> <a>
## Working with Images 
    
### Loading and Examing image


In [287]:
img = plt.imread('astronaut.jpg')

In [288]:
img.shape

(3072, 3072, 3)

In [289]:
# Display the image
plt.imshow(img)

# Hide the axes
plt.axis('off')
plt.show()


<a id ='image2'> <a>
### Pseudocolor plot from image data

Image data comes in many forms and it is not always appropriate to display the available channels in RGB space. 

In [290]:
# Print the shape of the image
print(img.shape)


(3072, 3072, 3)


In [291]:
# Compute the sum of the red, green and blue channels: intensity
intensity = img.sum(axis =2)

# Print the shape of the intensity
print(intensity.shape)



(3072, 3072)


In [292]:
# Display the intensity with a colormap of 'gray'
plt.imshow(intensity, cmap = 'gray')

# Add a colorbar
plt.colorbar()

# Hide the axes and show the figure
plt.axis('off')
plt.show()


<a id = 'extent'> <a>
### Extent and aspect
    


The ratio of the displayed width to height is known as the **image aspect** and the range used to label the x- and y-axes is known as the **image extent**. The default aspect value of 'auto' keeps the pixels square and the extents are automatically computed from the shape of the array if not specified otherwise.




In [293]:
# Specify the extent and aspect ratio of the top left subplot
plt.subplot(2,2,1)
plt.title('extent=(-1,1,-1,1),\naspect=0.5') 
plt.xticks([-1,0,1])
plt.yticks([-1,0,1])
plt.imshow(img, extent=(-1,1,-1,1), aspect=0.5)




<matplotlib.image.AxesImage at 0x129346e10>

In [294]:
# Specify the extent and aspect ratio of the top right subplot
plt.subplot(2,2,2)
plt.title('extent=(-1,1,-1,1),\naspect=1')
plt.xticks([-1,0,1])
plt.yticks([-1,0,1])
plt.imshow(img, extent=(-1, 1, -1, 1), aspect=1)




<matplotlib.image.AxesImage at 0x12935eb38>

In [295]:
# Specify the extent and aspect ratio of the bottom left subplot
plt.subplot(2,2,3)
plt.title('extent=(-1,1,-1,1),\naspect=2')
plt.xticks([-1,0,1])
plt.yticks([-1,0,1])
plt.imshow(img, extent=(-1,1, -1, 1), aspect=2)




<matplotlib.image.AxesImage at 0x12936e5f8>

In [296]:
# Specify the extent and aspect ratio of the bottom right subplot
plt.subplot(2,2,4)
plt.title('extent=(-2,2,-1,1),\naspect=2')
plt.xticks([-2,-1,0,1,2])
plt.yticks([-1,0,1])
plt.imshow(img, extent=(-2,2, -1, 1), aspect=2)




<matplotlib.image.AxesImage at 0x12935e400>

### Rescaling pixel intensities
![bay.jpg](attachment:bay.jpg)


In [297]:
image = plt.imread('bay2.jpg')


In [298]:
# Extract minimum and maximum values from the image: pmin, pmax
pmin, pmax = image.min(), image.max()
print("The smallest & largest pixel intensities are %d & %d." % (pmin, pmax))

# Rescale the pixels: rescaled_image
rescaled_image = 256*(image - pmin) / (pmax - pmin)
print("The rescaled smallest & largest pixel intensities are %.1f & %.1f." % 
      (rescaled_image.min(), rescaled_image.max()))


The smallest & largest pixel intensities are 114 & 208.
The rescaled smallest & largest pixel intensities are 0.0 & 256.0.


In [299]:
# Display the original image in the top subplot
plt.subplot(2,1,1)
plt.title('original image')
plt.axis('off')
plt.imshow(image)



<matplotlib.image.AxesImage at 0x129f054a8>

In [300]:
# Display the rescaled image in the bottom subplot
plt.subplot(2,1,2)
plt.title('rescaled image')
plt.axis('off')
plt.imshow(rescaled_image)

plt.show()



Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
