# Computer Infrastructure - Weather Data Automation  
**Author:** Marcella Morgan  

![Met Éireann Blocking Data Requests](met_eireann.png)




This notebook outlines the tasks I completed for the Computer Infrastructure module of the [Higher Diploma in Science in Data Analytics given by ATU Galway-Mayo](https://www.gmit.ie/higher-diploma-in-science-in-computing-in-data-analytics). My lecturer was [Ian McLoughlin](https://github.com/ianmcloughlin). The module focuses on three main areas:

1. **Command Line Tools**: Learning and using commands to manipulate files, create directories, and work with data.  
2. **Bash Scripting**: Automating repetitive tasks, such as downloading weather data and saving it with timestamps.  
3. **Automation with GitHub Actions**: Setting up a workflow that runs the script daily and commits the results back to my repository.  

Each task builds on the last, starting with simple commands and leading up to an automated solution. Along the way, I ran into a few challenges, from Git ignoring empty folders to formatting timestamps and learning how to make scripts executable.  

I’ll go through each task, explaining the commands, what they do, and how I tackled any issues I ran into.  

Credits: Image of Met Éireann cowering behind a firewall while demonic students hurl hundreds of spear-like data requests at it was created with DALL·E




## Task 1: Creating a Directory Structure  

I used the `mkdir` command to create a main directory called `data`, with two subdirectories: `timestamps` and `weather`. To move between directories, I used `cd`.  

**Issue**: Git doesn’t recognise empty directories, which was really confusing at first. After some searching, I found out you need to add a placeholder file to commit an empty directory.  

**Useful Link**:  
[How to add empty directories in Git](https://www.geeksforgeeks.org/how-to-add-an-empty-directory-to-a-git-repository/)


Other Links:
https://mspoweruser.com/cmd-create-folder/
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/mkdir

post on how to create multiple directories:
https://askubuntu.com/questions/731721/is-there-a-way-to-create-multiple-directories-at-once-with-mkdir

post on how to do this in python 
https://www.freecodecamp.org/news/creating-a-directory-in-python-how-to-create-a-folder/

Task 2: Timestamps

I used the date command to output the current date and time, appending the output to a file named now.txt, making sure to use the >> operator to append (not overwrite) the file. 
I repeates this step ten times and used the more command to verify that now.txt has the expected content.

I used the mkdir command followed by the directory to make the directory now.txt

https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/date

creating files:
https://www.redhat.com/sysadmin/create-delete-files-directories-linux

issues around using touch in powershell
https://stackoverflow.com/questions/67659993/touch-command-not-working-what-should-i-use-instead/67665941#67665941

changing default terminal
https://stackoverflow.com/questions/69040449/how-to-change-default-terminal-in-vs-code

I had issues around using more too.


Task 3: Formatting Timestamps

Issues with acessing manual with man date. Had to use --help command instead


https://stackoverflow.com/questions/51262422/git-bash-no-man-command

Task 4: Create Timestamped Files



## Task 7

When I first tried to run the weather.sh script, I ran into a couple of problems. At first, I wasn’t sure how to actually execute the script in the terminal—I didn’t realise you need to use ./ before the script name to tell the terminal to run it from the current directory. That was an easy fix once I figured it out.

The next issue was with the wget command, which wasn’t recognised. It turned out that wget wasn’t installed on my system, which is something I didn’t expect since it’s such a standard tool. After spending a bit of time trying to fix this locally, I realised it was easier to switch to GitHub Codespaces, where all the tools are already pre-installed and set up. Once I moved the project to Codespaces, everything ran smoothly, and I could execute the script without any further problems.

It was a good reminder that environment issues can slow you down, and having a consistent setup (like Codespaces) is worth it!

When I first tried to set up the GitHub Actions workflow to push the weather data back to my repository, I kept running into a permissions issue. Initially, I tried using the GITHUB_TOKEN for authentication in the workflow, but for some reason, it still didn’t work. After a bit of troubleshooting, I realised that the repository’s workflow permissions were set to read-only, which was blocking the push.

To fix this, I updated the repository settings to allow Read and Write permissions for workflows. Once that was sorted, I re-ran the workflow, and it worked perfectly—new weather files were being created and pushed to the repository automatically! A good reminder to check your settings when something doesn’t seem to be working.



github token
https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/controlling-permissions-for-github_token

## Task 9: Weather Data Analysis

For this task I explored the weather data retrieved from Met Éireann for Athenry using a script automated with GitHub Actions. The goal is to demonstrate data exploration and analysis with pandas, focusing on trends and key insights from the dataset. I also attempted to present the results in a polished format that would be suitable for an employer.


### Loading Data

Copy and paste file path to dataset here:

In [13]:
data = 'data/weather/20241218_120632_athenry.json' 

### Looking at Data

**Looking at dataset:**

In [89]:
import pandas as pd

df = pd.read_json(data)
df.head(5)

Unnamed: 0,name,temperature,symbol,weatherDescription,text,windSpeed,windGust,cardinalWindDirection,windDirection,humidity,rainfall,pressure,dayName,date,reportTime
0,Athenry,13,05n,Rain showers,"""Rain shower""",22,-,SW,225,96,0.3,993,Wednesday,2024-12-18,00:00
1,Athenry,13,05n,Rain showers,"""Rain shower""",30,-,SW,225,94,0.7,992,Wednesday,2024-12-18,01:00
2,Athenry,12,05n,Rain showers,"""Rain shower""",22,44,SW,225,96,0.8,991,Wednesday,2024-12-18,02:00
3,Athenry,12,05n,Rain showers,"""Rain shower""",26,46,SW,225,97,0.8,991,Wednesday,2024-12-18,03:00
4,Athenry,12,09n,Rain,"""Moderate rain """,28,-,NW,315,95,1.6,993,Wednesday,2024-12-18,04:00


In [90]:
dtypes_df = pd.DataFrame(df.dtypes, columns=["dtypes"])
dtypes_df.style.background_gradient()

Unnamed: 0,dtypes
name,object
temperature,int64
symbol,object
weatherDescription,object
text,object
windSpeed,int64
windGust,object
cardinalWindDirection,object
windDirection,int64
humidity,int64


**Brief Description of dataset showing count, mean, max and min values:**

In [94]:
df['date'] = pd.to_datetime(df['date'])
df['reportTime'] = pd.to_datetime(df['reportTime'])
df['datetime'] = pd.to_datetime(df['date'].astype(str) + ' ' + df['reportTime'].astype(str))

df.describe(include=[float, int])

  df['datetime'] = pd.to_datetime(df['date'].astype(str) + ' ' + df['reportTime'].astype(str))


Unnamed: 0,temperature,windSpeed,windDirection,humidity,rainfall,pressure
count,12.0,12.0,12.0,12.0,12.0,12.0
mean,10.583333,17.416667,258.75,92.916667,0.51,997.5
std,1.676486,8.447252,27.97117,2.74552,0.668036,5.40202
min,9.0,6.0,225.0,88.0,0.0,991.0
25%,9.0,10.5,225.0,91.0,0.0,992.75
50%,10.0,18.5,270.0,92.5,0.155,997.5
75%,12.0,23.0,270.0,95.25,0.8,1002.25
max,13.0,30.0,315.0,97.0,1.9,1005.0


In [73]:
print(df['datetime'].dtype)
print(df['reportTime'].dtype)
print(df['date'].dtype)

datetime64[ns, UTC-19:00]
datetime64[ns]
datetime64[ns]


**Checking for missing values:**

In [52]:
missing_values = df.isnull().sum()
missing_values_df = pd.DataFrame(missing_values, columns=["Missing Values"])
missing_values_df.style.background_gradient()


Unnamed: 0,Missing Values
name,0
temperature,0
symbol,0
weatherDescription,0
text,0
windSpeed,0
windGust,0
cardinalWindDirection,0
windDirection,0
humidity,0


## Closer look at Temperature

In [46]:
temperature_df = df[['reportTime','temperature']].copy()
temperature_df.style.background_gradient(cmap="coolwarm")
#temperature_df.style.hide_index()


Unnamed: 0,reportTime,temperature
0,00:00,13
1,01:00,13
2,02:00,12
3,03:00,12
4,04:00,12
5,05:00,10
6,06:00,10
7,07:00,9
8,08:00,9
9,09:00,9


## END