# Weather Analysis

This notebook summarizes the tasks performed as part of the Computer Infrastructure module in the Higher Diploma in Science in Data Analytics, ATU.\
The notebook itself is task #8 out of 9 tasks in total. Reports for the other 8 tasks are provided below.\
First, import the packages required for the tasks & analysis.

In [16]:
# Data frames.
import pandas as pd

## Directory Structure

Create a directory named `data` at the root of the `/computer-infrastructure` repository, and create two subdirectories: `timestamps` and `weather`.

```bash
$ mkdir data/
$ mkdir data/timestamps/
$ mkdir data/weather/
```

## Timestamps

Navigate to `data/timestamps`. Append the current date and time to `now.txt` ten times, and verify the `now.txt` file has the expected content.

```bash
$ cd data/timestamps
$ date >> now.txt (x10)
$ more now.txt
```

## Formatting Timestamps

Format the output from `date` command to `YYYYmmdd_HHMMSS` and append to `formatted.txt`.
Refer to `date` manual page for more formatting options.

```bash
$ date +"%Y%m%d_%H%M%S" >> formatted.txt
$ man date
```

From the ```date``` manual:\
%Y     year\
%m     month (01..12)\
%d     day of month (e.g., 01)\
%H     hour (00..23)\
%M     minute (00..59)\
%S     second (00..60)

## Timestamped Files

Create an empty file with a name in the ```YYYYmmdd_HHMMSS``` format.

```bash
touch `date +"%Y%m%d_%H%M%S.txt"`
```

## Download Today's Weather Data

Using wget, download the latest weather data for the Athenry weather station from Met Eireann, available here: https://prodapi.metweb.ie/observations/athenry/today.
Save the file as weather.json, in the data/weather directory.

```bash
 $ cd ../weather
 $ wget -O weather.json https://prodapi.metweb.ie/observations/athenry/today 
```

-O allows us to specify the filename for the download, instead of it defaulting to 'today'

## Timestamp the Data

Save the downloaded file with a timestamped name in the format ```YYYYmmdd_HHMMSS.json```.

```bash
wget -O `date +"%Y%m%d_%H%M%S.json"` https://prodapi.metweb.ie/observations/athenry/today 
```

## Write the Script

In the root of the repository, write a bash script ```weather.sh``` that automates the weather data saving to the data/weather directory.
Make it executable and test the script by running it.

```bash
$ cd ../..
$ touch weather.sh
```

In the weather.sh file:
```bash
#! /bin/bash
date
echo "Downloading weather data"
wget -O data/weather/`date +"%Y%m%d_%H%M%S.json"` https://prodapi.metweb.ie/observations/athenry/today 
echo "Weather data downloaded"
date
```

Back on the command line, grant permissions to the script and execute it:
```bash
chmod u+x weather.sh
./weather.sh
```




## Pandas

In [26]:
# Read the data and show
df = pd.read_json("data/weather/20241111_222306.json")
df

Unnamed: 0,name,temperature,symbol,weatherDescription,text,windSpeed,windGust,cardinalWindDirection,windDirection,humidity,rainfall,pressure,dayName,date,reportTime
0,Athenry,3,15n,Fog / Mist,"""Fog""",2,-,NE,45,98,0.0,1035,Monday,2024-11-11,00:00
1,Athenry,3,15n,Fog / Mist,"""Fog thinning""",2,-,NE,45,99,0.0,1035,Monday,2024-11-11,01:00
2,Athenry,2,15n,Fog / Mist,"""Fog thickening""",4,-,NE,45,98,0.1,1035,Monday,2024-11-11,02:00
3,Athenry,2,15n,Fog / Mist,"""Recent Fog""",2,-,NE,45,98,0.0,1036,Monday,2024-11-11,03:00
4,Athenry,2,15n,Fog / Mist,"""Fog thickening""",4,-,E,90,98,0.0,1036,Monday,2024-11-11,04:00
5,Athenry,2,15n,Fog / Mist,"""Recent Fog""",4,-,NE,45,99,0.0,1037,Monday,2024-11-11,05:00
6,Athenry,2,15n,Fog / Mist,"""Mist""",2,-,N,0,98,0.0,1037,Monday,2024-11-11,06:00
7,Athenry,2,15n,Fog / Mist,"""Mist""",6,-,S,180,98,0.0,1038,Monday,2024-11-11,07:00
8,Athenry,1,15d,Fog / Mist,"""Recent Fog""",4,-,E,90,99,0.0,1038,Monday,2024-11-11,08:00
9,Athenry,2,15d,Fog / Mist,"""Mist""",2,-,E,90,99,0.0,1039,Monday,2024-11-11,09:00


[data.gov.ie](https://data.gov.ie/dataset/todays-weather-athenry) provides the following summary of the data file:\
"This file contains a list of observations for every hour of the current day for our synoptic station in Athenry, Co Galway. The file is updated hourly. Time values are Local times.\"
There are 15 columns of data, summarised below:

- name: name of location
- temperature: temperature in whole degrees, celsius
- symbol: symbol
- weatherDescription: High Level weather description
- text: Low level weather description
- windSpeed: measured in knots
- windGust
- cardinalWindDirection: cardinal (or compass) wind direction
- windDirection: represented in degrees.
- humidity: relative humidity.
- rainfall: rainfall measured in millimetres.
- pressure: mean sea level (msl) pressure, measured in millibars.
- dayName: day of the week.
- date: date
- reportTime: hourly timestamp

A disclaimer on the website notes that this data is not quality controlled.

In [30]:
# Print information on the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23 entries, 0 to 22
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   name                   23 non-null     object        
 1   temperature            23 non-null     int64         
 2   symbol                 23 non-null     object        
 3   weatherDescription     23 non-null     object        
 4   text                   23 non-null     object        
 5   windSpeed              23 non-null     int64         
 6   windGust               23 non-null     object        
 7   cardinalWindDirection  23 non-null     object        
 8   windDirection          23 non-null     int64         
 9   humidity               23 non-null     int64         
 10  rainfall               23 non-null     float64       
 11  pressure               23 non-null     int64         
 12  dayName                23 non-null     object        
 13  date   

In [31]:
# Generate some simple stats on the data.
df.describe()

Unnamed: 0,temperature,windSpeed,windDirection,humidity,rainfall,pressure,date
count,23.0,23.0,23.0,23.0,23.0,23.0,23
mean,5.391304,4.565217,82.173913,93.695652,0.004348,1038.434783,2024-11-11 00:00:00
min,1.0,2.0,0.0,74.0,0.0,1035.0,2024-11-11 00:00:00
25%,2.0,3.0,67.5,92.5,0.0,1037.0,2024-11-11 00:00:00
50%,3.0,4.0,90.0,98.0,0.0,1039.0,2024-11-11 00:00:00
75%,8.5,7.0,90.0,98.0,0.0,1040.0,2024-11-11 00:00:00
max,13.0,7.0,180.0,99.0,0.1,1041.0,2024-11-11 00:00:00
std,4.031006,1.996044,34.994353,7.754382,0.020851,1.996044,


## End