# Weather Py Analysis

## Introduction 
We were asked to generate a random list of at least 500 cities and investigate the relationships between the following variables:
- Latitude vs. Max Temp 
- Latitude vs. Humidity 
- Latitude vs. Cloudiness 
- Latitude vs. Wind Speed (MPH)

## Data Collection 
To investigate this question, we generate the following dataset:
- Two numpy arrays containing 2500 latitude and longitude coordinates, and zipped them together. 
- After generating these coordinates, we queried CitiPy to find the nearest city to those coordinates, and then we added each unique city to a list . 
    - This gave us a list of 1422 unique cities to start with.
- The next step was to query the Open Weather Map API to retrive temperature related data for those unique cities. 
    - Because OWM has a 60 calls per minute limit to their API, we had to segment those 1422 cities into a set of 24 API callset groups, where we would process 60 cities per callset group. 
    - For each city in the call group we queried the OWM API to see if the city was in their dataset, and if so, grab a set of temperature variables for each city. 
    - Cities with successful API call results were stored in a new list. 
- After running through all 1422 unique cities, we saved city information from the successful API calls to a CSV (cities_in_owm.csv), as well as the individual successful/unsucessful call responses to a separate CSV (api_results.csv). 
- The results of our OWM API calls were the following:
    - Cities in dataset: 1244
    - Cities in Northern Hemisphere (positive latitudes): 881
    - Cities in Southern Hemisphere (negative latitudes): 354
    - Cities NOT in OWM: 178
    
## Analysis

- Using our dataset of 1244 cities, we then investigated each of the questions that we were asked to analyze. 
- For each question, we generated a scatterplot showing the distribution of the values of the dependent variable along the axis of the independent variable, "Latitude". 
- Additionally, I added a second scatter plot of the same relationships, but in the additional scatterplot:
    - I ran simple linear regressions for the dependent variable in northern hemisphere and southern hemisphere cities and plotted the regression line along the original data. In most of these cases, simple linear regression isn't the appropriate analysis to make, but I wanted to see:
        - If I could expand my programmatic control of matplotlib to include such an analysis, and 
        - What a linear regression would return for such data.
    - I also reduced the alpha values of the markers so they wouldn't visually compete with the regression lines, but kept them partially visible so one could see they were the same markers as in the main scatterplot and see how the lines transected the data.  


### Latitude vs. Max Temperature

![latitude_vs_maxtemp.png](attachment:latitude_vs_maxtemp.png)

Looking at our basic plot, we see that the highest max temperature points are clustered around Latitude 0, and that the temperatures decrease as one moves more north or more south. Even though there are outliers (especially in the southern hemisphere), these findings show how the cities around the equator are generally hotter than cities farther from it. 

When we group the means by latitude, we see a clear trend of decreasing max temp mean as we go up in our latitude groups. We can see this in the notebook where we display a table of means, and in the plot below:

![max_temp_means.png](attachment:max_temp_means.png) 



### Latitude vs. Humidity

![latitude_vs_humidity.png](attachment:latitude_vs_humidity.png)

Unlike our Max Temp findings, latitude and humidity do not show a clear linear relationship. We have a curve where the lowest latitudes and the highest latitudes show higher humidity levels than do the mid-region latitudes.

![humidity_means.png](attachment:humidity_means.png)

### Latitude vs. Cloudiness

![latitude_vs_cloudiness.png](attachment:latitude_vs_cloudiness.png)

Cloudiness percentage seems to show a similar pattern as did our humidity findings, where the lowest latitudes and the highest ones tend towards a greater cloudiness percentange.  There is a sharper rise between the mid-latitudes and the upper latitudes here though.

![cloudiness_means.png](attachment:cloudiness_means.png)

### Latitude vs. Wind Speed (mph)

![latitude_vs_windspeed.png](attachment:latitude_vs_windspeed.png)

Wind speeds seem to increase as we go up in latitude, but the means for these groups don't show a great nominal difference as we are only moving up one or two miles per hour within the groups. The lowest means are just under 6mph, and the highest means are just over 10mph, so the differences aren't very great.  

![winddspeed_means.png](attachment:winddspeed_means.png)