## Programming for Data Analysis Project 2018

### Patrick McDonald G00281051

#### Problem statement

For this project you must create a data set by simulating a real-world phenomenon of your choosing. You may pick any phenomenon you wish – you might pick one that is of interest to you in your personal or professional life. Then, rather than collect data related to the phenomenon, you should model and synthesise such data using Python. We suggest you use the numpy.random package for this purpose.

Specifically, in this project you should:

* Choose a real-world phenomenon that can be measured and for which you could collect at least one-hundred data points across at least four different variables.
* Investigate the types of variables involved, their likely distributions, and their relationships with each other.
* Synthesise/simulate a data set as closely matching their properties as possible.
* Detail your research and implement the simulation in a Jupyter notebook – the data set itself can simply be displayed in an output cell within the notebook.


I'll try this with git push origin master:master

## What dataset to simulate?

For the purpose of this project, I shall extract some wave buoy data from the [M6 weather buoy](http://www.marine.ie/Home/site-area/data-services/real-time-observations/irish-weather-buoy-network) off the westcoast of Ireland. I surf occassionally, and many surfers, like myself; use weather buoy data in order to predict when there will be decent waves to surf. There are many online resources that provide such information, but I thought this may be an enjoyable exploration of raw data that is used everyday, worldwide!

In [11]:
# Import libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Downloaded hly62095.csv from https://data.gov.ie/dataset/hourly-data-for-buoy-m6 
# Opened dataset in VSCode. It contains the label legend, so I have skipped these rows.

df = pd.read_csv("hly62095.csv", skiprows = 19, low_memory = False,)

Downloaded hly62095.csv from https://data.gov.ie/dataset/hourly-data-for-buoy-m6. Opened dataset in VSCode. It contains the label legend, so I have skipped these rows  1-19:

###Label legend

'''
1.  Station Name: M6
2.  Station Height: 0 M 
3.  Latitude:52.990  ,Longitude: -15.870
4. 
5. 
6.  date:	  -  Date and Time (utc)
7.  temp:  	  -  Air Temperature (C)	
8.  rhum:	  -  Relative Humidity (%)
9.  windsp:	  -  Mean Wind Speed (kt)
10. dir:	  -  Mean Wind 	Direction (degrees)
11. gust:	  -  Maximum Gust (kt)
12. msl:	  -  Mean Sea Level Pressure (hPa)
13. seatp:	  -  Sea Temperature (C)
14. per:	  -  Significant Wave Period (seconds)
15. wavht:	  -  Significant Wave Height (m)
16. mxwav: 	  -  Individual Maximum Wave Height(m)
17. wvdir:    -  Wave Direction (degrees)
18. ind:      -  Indicator    
19. 
20. date,temp,rhum,wdsp,dir,gust,msl,seatp,per,wavht,mxwave,wvdir
21. 25-sep-2006 09:00,15.2, ,8.000,240.000, ,1007.2,15.4,6.000,1.5, , 
22. 25-sep-2006 10:00,15.2, ,8.000,220.000, ,1008.0,15.4,6.000,1.5, ,......... 

'''

In [12]:
# Opened first 100 rows to view DataFrame
df

Unnamed: 0,date,temp,rhum,wdsp,dir,gust,msl,seatp,per,wavht,mxwave,wvdir
0,25-sep-2006 09:00,15.2,,8.000,240.000,,1007.2,15.4,6.000,1.5,,
1,25-sep-2006 10:00,15.2,,8.000,220.000,,1008.0,15.4,6.000,1.5,,
2,25-sep-2006 11:00,15.0,,10.000,220.000,,1008.4,15.4,6.000,1.5,,
3,25-sep-2006 12:00,15.0,,12.000,240.000,,1009.0,15.4,6.000,1.0,,
4,25-sep-2006 13:00,15.0,87.000,11.000,280.000,16.000,1009.6,15.5,6.000,1.2,,
5,25-sep-2006 14:00,14.7,86.000,15.000,280.000,18.000,1010.0,15.5,5.000,1.2,,
6,25-sep-2006 15:00,13.9,85.000,15.000,270.000,20.000,1010.4,15.4,5.000,1.3,,
7,25-sep-2006 16:00,14.8,81.000,13.000,280.000,17.000,1010.8,15.4,5.000,1.4,,
8,25-sep-2006 17:00,14.8,76.000,12.000,280.000,18.000,1011.0,15.4,5.000,1.6,,
9,25-sep-2006 18:00,14.8,76.000,11.000,270.000,17.000,1011.6,15.4,6.000,1.8,,


There are a significant missing datapoints, and its a large sample. I'm going to explore this further, and extract the relevant data for the first week of September 2018. This will give me enough data to explore and simulate for this project.