# Weather Data Processing Report
***

This report describes how data was collected from https://prodapi.metweb.ie/observations/athenry/today using Terminal. Each section builds on the previous one, focusing on creating directory structures, manipulating timestamps, downloading weather data, and automating processes using Bash scripting. Below are the task descriptions, steps taken, and commands used.   
In the last part of this report there is an analysis completed for one of the data files downloaded with a short explanation of the data from data.gov.ie.

## 1. Collecting Data 

### 1.1 Create Directory Structure

Using the command line, it was created a directory named `data` at the root of the repository `computer_infrastructure`. Within `data`, it was then created two subdirectories: `timestamps` and `weather`.
  
To complete the directory structure, the following process was carried out using terminal commands:

1. Verifying the Current Directory:  

    The process began by ensuring the current working directory was the root of the repository, `computer_infrastructure`.  
    This was confirmed by examining the terminal prompt `-> computer_infrastructure git:(main)` and listing the contents using the `ls` command, which showed only the `README.md` file.

2. Researching Commands for Directory Creation:  

    Research was conducted to identify the appropriate command for creating directories.  
    An online tutorial from the Simple Dev website provided information on using the `mkdir` command to create folders [[5]](#5).

3. Creating the Main Directory:  

    The main directory, `data`, was created in the root of the repository using the command:  
    ```bash
    mkdir data
    ```
    The `ls` command was used again to verify that the `data` directory had been successfully created.

4. Creating Subdirectories:  

    To create the subdirectories `timestamps` and `weather` within `data`, the `mkdir` command with the `-p` option was used. 
    This option allows nested directory creation.  
    The command executed was:  
    ```bash
    mkdir -p data/timestamps data/weather  
    ```
    This created both subdirectories simultaneously under the `data` directory.

5. Correcting Subdirectory Name:  

    During verification, it was observed that one of the subdirectories was mistakenly named `timestamp` instead of `timestamps`.   
    To address this, the `mv` command was used for renaming. After reviewing the `mv` manual, using the `man mv` command, the correct syntax was determined and executed as follows: 
    ```bash
    mv data/timestamp data/timestamps.  
    ```
    This renamed the subdirectory to the correct name.

### 1.2 Timestamps

The `data/timestamps` directory was navigated to, and the `date` command was used to output the current date and time. The output was appended to a file named `now.txt` using the `>>` operator to avoid overwriting it. This process was repeated ten times, and the `more` command was used to verify that the expected content was present in it.

The following steps describe the approach in detail:

1. Navigate to the Target Directory: 

    The directory `data/timestamps` was accessed from the root of the repository `computer_infrastructure` using the cd command [[6]](#6): 
    ```bash
    cd data/timestamps
    ```    
2. Append Date and Time to the File:   

    Research and class materials provided insights on formatting and appending date-time entries to a file. The date command with the `>>` operator was used to append the current date and time to `now.txt`.  
    The command also included a format string to ensure the output followed the `YYYYMMDDHHMMSS` pattern: 
    ```bash
    date +"%Y%m%d%H%M%S" >> now.txt
    ```
    The `>>` operator appends the output to the file without overwriting it [[7]](#7).   
    The `+"%Y%m%d%H%M%S"` format ensures that the timestamp is recorded in a compact and consistent format.
    
3. Verify File Creation and Content:   

    The `ls` command was used to list files in the timestamps directory, confirming the creation of `now.txt` [[8]](#8).  
    The `cat` command was then used to display the file content for verification [[9]](#9):   
    ```bash
    cat now.txt
    ```
    4. Repeat the Process:   

    The date appending command was executed nine additional times to add more timestamps to `now.txt`.

5. Final Verification Using `more`:  

    To review the content of `now.txt` after all entries were appended, the 'more' command was used:  
    ```bash
    more now.txt
    ```
    Unlike `cat`, which displays all content at once, `more` is helpful for viewing larger files in a paginated manner [[10]](#10).

### 1.3 Formatting Timestamps

The `date` command was run again with the output formatted as `YYYYmmdd_HHMMSS` (e.g., `20261114_130003` for 1:00:03 PM on November 14, 2016). The `man date` page was referenced for formatting options and exited by pressing `q`. The formatted output was appended to a file named `formatted.txt`.

Below are the steps and commands used to complete the task:

1. Consulting the Manual for the `date` Command:  

   To determine how to format the output of the `date` command, the `man` (manual) page for the command was consulted:  
   ```bash
   man date
   ```
   This provided the formatting options, including:

   - Year: `%Y`
   - Month: `%m`
   - Day: `%d`
   - Hour: `%H`
   - Minute: `%M`
   - Second: `%S`  

   It was noted that to display the formatted date, the `+` operand must be used, followed by the desired format enclosed in double quotes. For example:  
   ```bash
   date +"%Y%m%d_%H%M%S"
   ```   
   For additional reference, the documentation at PhoenixNAP was consulted, reinforcing the use of formatting strings and the + operand [[11]](#11). 

2. Running the Formatted `date` Command:  

   To test the formatting, the following command was executed:  
   ```bash
   date +"%Y%m%d_%H%M%S"
   ```   
   This returned the current date and time in the expected format (e.g., `20241020_214025`).

3. Navigating to the Target Directory:  

   As the `formatted.txt` file was to be created in the `data` directory (used previously in Tasks 1 and 2), navigation was required. From the `data/timestamps` directory, the following command moved up one level:  
   ```bash
   cd ../
   ``` 
   Alternatively, the shorthand `../` was also used directly to indicate moving to the parent directory.

4. Appending the Formatted Date to `formatted.txt`:  

   While in the `data` directory, the following command appended the formatted date and time to the `formatted.txt` file:  
   ```bash
   date +"%Y%m%d_%H%M%S" >> formatted.txt
   ```
   The `>>` operator ensured the output was appended to the file without overwriting any existing content.

5. Verifying the File Content:  

   To confirm the file content, the `cat` command was used:  
   ```bash
   cat formatted.txt
   ```     
   This displayed the date and time in the correct format (e.g., `20241020_223520`).

### 1.4 Create Timestamped Files

The `touch` command was used to create an empty file named in the `YYYYmmdd_HHMMSS.txt` format. This was achieved by embedding the `date` command using backticks (`) within the `touch` command. Redirection (`>>`) was not used in this step.

Steps used:

1. Researching the Approach:  

   While researching how to create a file with a dynamically generated name, a helpful reference was found on SuperUser [[12]](#12). This outlined how to combine the `touch` command with the `date` command using backticks.  

   The format for embedding the date was:
   - Backticks (\`) to encapsulate the `date` command.
   - The `date` command followed by the `+` operand to specify the desired output format.
   - Adding `.txt` after the formatted date to ensure the file is created as a text file.  

2. Constructing the Command:  

   The following command was used to create the file:  
   ```bash
   touch `date +"%Y%m%d_%H%M%S.txt"`
   ```
   - touch: Creates a new empty file.  
   - \`date +"%Y%m%d_%H%M%S"\`: Formats the current date and time into the desired `YYYYmmdd_HHMMSS` structure.  
   - Backticks (\` \`): Embed the formatted date output as the file name.  
   - .txt: Ensures the file extension is `.txt`.  

3. Verifying the File Creation:  

   After executing the command, the `ls` command was run to list the files in the directory.
   
   This confirmed the creation of the file `20241020_225246.txt` (or a similarly formatted name based on the exact time the command was executed).


### 1.5 Download Today's Weather Data

The directory was changed to `data/weather` and the it was downloaded the latest weather data for the Athenry weather station from Met Eireann using the `wget` command. The `-O <filename>` option was used to save the file as `weather.json`. The data was obtained at this URL: 

https://prodapi.metweb.ie/observations/athenry/today. [[13]](#13)

Below is a detailed explanation of the steps taken:

1. Installing `wget` 

   Initially, the `wget` command was not available on the system. Installation instructions were researched online and a solution was found on Stack Overflow [[14]](#14), which suggested using the Homebrew package manager. To install `wget`, the following command was used in Terminal:  
   ```bash
   brew install wget
   ```
   After the installation, it was confirmed that `wget` was available by typing `wget --help`, which displayed the list of options and usage instructions.

2. Understanding `wget` Options 

   To understand how the `wget` command works, especially the `-O` option, the manual was consulted by typing:  
   ```bash
   man wget
   ```
   From the manual, it was learned that the `-O` option, also written as `--output-document`, allows users to specify a single output filename for the downloaded content. This option is useful when the original filename from the URL is not descriptive or suitable for the task.  

3. Navigating to the Target Directory 

   Navigated to the `data/weather` directory by running:  
   ```bash
   cd data/weather
   ```
   This ensured the downloaded file would be stored in the correct location.

4. Downloading the Weather Data

   Using the `wget` command with the `-O` option, the data was downloaded from the specified URL and saved it as `weather.json`. The command used was:  
   ```bash
   wget -O weather.json https://prodapi.metweb.ie/observations/athenry/today
   ``` 
   This successfully downloaded the weather data and saved it under the name `weather.json` in the current directory.

5. Verifying the Download

   After the download, the presence of `weather.json` was confirmed by listing the contents of the directory using the `ls` command.
    
   The output showed that `weather.json` was successfully created.  

6. Viewing the File Content  

   To inspect the contents of the file, the `nano` text editor was used:  
   ```bash
   nano weather.json
   ``` 
   This allowed the verification that the file contained the expected weather data in JSON format.


### 1.6 Timestamp the Data

The command from Task 5 was modified to save the downloaded file with a timestamped name in the format `YYYYmmdd_HHMMSS.json`. The `date` command was used to generate the timestamp, and the `wget` command was updated to include the timestamped filename.

To complete this task, it was used the knowledge gained from section 1.4 and 1.5. Steps:

1. The command was structured as follows:
    ```bash
    wget -O \`date +"%Y%m%d_%H%M%S.json"\` https://prodapi.metweb.ie/observations/athenry/today
    ```
    Here’s a breakdown of the command:  
    - `wget`: Used to download the weather data from the specified URL.  
    - `-O`: Specifies the output file name.  
    - \`date +"%Y%m%d_%H%M%S.json"\`: Generates the timestamped file name:   
        - using the current date and time in the desired format.   
        - appending `.json` to indicate the file type.   
        - The backticks (\`) embed the output of the `date` command directly into the `wget` command, dynamically naming the file.    

2. After running the command in the Terminal, the file was successfully saved in the current directory with the timestamped name, such as `20241124_153025.json`. 

3. To confirm that the file contained the expected weather data, the `nano` command was used to open and verify its content, ensuring it was in the correct JSON format and included the weather data as expected.

### 1.7 Write the Script

A bash script called `weather.sh` was written in the root of the repository to automate the process from Task 6. The script downloads the weather data and saves it to the `data/weather` directory with a timestamped filename in the format `YYYYmmdd_HHMMSS.json`. The script was made executable, and it was tested by running it. See the steps below:


1. Creating the Script File

   The process began with the creation of an empty file named `weather.sh` in the `weather` directory using the `touch` command:  
   ```bash
   touch weather.sh
   ```
   The script file was then moved to the root directory of the repository (`computer_infrastructure`) to ensure accessibility. This was achieved using the `mv` command:  
   ```bash
   mv weather.sh ../../
   ``` 
   Apple’s Terminal guide on moving files between directories was referenced for this step [[15]](#15).

2. Adding the Script Content

   The file `weather.sh` was opened in Visual Studio Code, and a "shebang" (`#!/bin/bash`) was added as the first line to indicate that the script should be executed using the bash shell. This ensures the system interprets the script correctly [[16]](#16).  

   The second line contained the command used in Task 6 to download the weather data and save it with a timestamped filename in the `data/weather` directory:  
   ```bash
   wget -O data/weather/`date +"%Y%m%d_%H%M%S.json"` https://prodapi.metweb.ie/observations/athenry/today
   ``` 
   Care was taken to ensure there were no spaces between `data/weather/` and the `date` command, as such spaces could result in filename errors.

3. Making the Script Executable

   The permissions of `weather.sh` were checked in the root the main repository `computer_infrastructure` using:  
   ```bash
   ls -al
   ``` 
   It was observed that the file lacked execute permissions, which are required to run the script. The permissions were updated using the `chmod` command:  
   ```bash
   chmod u+x weather.sh
   ``` 
   Re-running `ls -al` confirmed the change, with the `x` permission now visible and the file name displayed in a distinct color, indicating its executable status [[17]](#17).

4. Testing the Script  

   The script was tested by executing it from the root directory:  
   ```bash
   ./weather.sh
   ``` 
   An initial error occurred:  
   ```bash
   data/weather/: Is a directory
   ```  
   Troubleshooting with Chat GPT revealed that the error was caused by an unintended space between `data/weather/` and the `date` command in the script [[18]](#18):
   
   - Before Change: 
   ```bash
   wget -O data/weather/ `date +"%Y%m%d_%H%M%S.json"` https://prodapi.metweb.ie/observations/athenry/today
   ```
   - After Change: 
   ```bash
   wget -O data/weather/`date+ "%Y%m%d_%H%M%S.json"` https://prodapi.metweb.ie/observations/athenry/today 
   ```
   The space was removed, and the script was executed again. This successfully created a file with a timestamped name (e.g., `20241028_163417.json`) in the `data/weather` directory.

5. Verifying the Output

   The content of the generated file was verified by opening it in the `nano` text editor:  
   ```bash
   nano data/weather/20241028_163417.json
   ```
   This confirmed that the file contained the expected weather data in JSON format.

## 2. Weather Data Analysis


In this task, the weather data file, previously downloaded using a script, is loaded into the `weather.ipynb` notebook using the `pandas.read_json()` function. The data is then examined and summarized to understand its structure and key variables using Pandas. A brief explanation of the dataset's contents is provided, referencing the information available on data.gov.ie. This includes details about the types of weather data collected, the geographic coverage, the time period covered, and the frequency of data collection.

### 2.1 Loading the data

In [None]:
# Data frame
import pandas as pd

The dataset is read into the notebook using the `pandas.read_json()` function [[19]](#19). This command loads the weather data from the specified JSON file, which contains weather collected data:

In [10]:
# Read the data
df = pd.read_json('data/weather/20241028_163417.json')

### 2.2 Inspecting the data

It is important to inspect and understand its structure. Therefore df.head() is used to show the first 5 rows of the dataframe [[20]](#20). This is a convenient way to quickly inspect the structure and content of the data, including the column names and the first few data entries.

In [None]:
# Display the first rows to get an overview of the data
df.head()

Unnamed: 0,name,temperature,symbol,weatherDescription,text,windSpeed,windGust,cardinalWindDirection,windDirection,humidity,rainfall,pressure,dayName,date,reportTime
0,Athenry,14,46n,Light rain,"""Light rain """,22,-,SW,225,97,0.7,1015,Monday,2024-10-28,00:00
1,Athenry,14,05n,Rain showers,"""Rain shower""",22,-,SW,225,97,0.7,1015,Monday,2024-10-28,01:00
2,Athenry,14,05n,Rain showers,"""Rain shower""",20,-,SW,225,98,0.3,1015,Monday,2024-10-28,02:00
3,Athenry,14,46n,Light rain,"""Light Drizzle """,15,-,W,270,98,0.4,1015,Monday,2024-10-28,03:00
4,Athenry,13,46n,Light rain,"""Recent Drizzle """,17,-,W,270,99,0.3,1015,Monday,2024-10-28,04:00


From the output of `df.head()` it is noticeable that this dataset contains 15 columns with weather related conditions in Athenry on October 28, 2024. It provides a detailed snapshot of the weather at hourly intervals, capturing key parameters like temperature, humidity, wind, and rainfall for the given day.       

### 2.3 Data Type

Next, the function `df.info()` from pandas is used to have a closer look into the different types of data used in this dataframe [[21]](#21):

In [12]:
# Summary of DataFrame

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17 entries, 0 to 16
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   name                   17 non-null     object        
 1   temperature            17 non-null     int64         
 2   symbol                 17 non-null     object        
 3   weatherDescription     17 non-null     object        
 4   text                   17 non-null     object        
 5   windSpeed              17 non-null     int64         
 6   windGust               17 non-null     object        
 7   cardinalWindDirection  17 non-null     object        
 8   windDirection          17 non-null     int64         
 9   humidity               17 non-null     int64         
 10  rainfall               17 non-null     float64       
 11  pressure               17 non-null     int64         
 12  dayName                17 non-null     object        
 13  date   

This function outcomes show that the dataset contains 17 entries and 15 columns. All columns have non-null values, meaning there are no missing data points. The data types include integers (for columns like temperature, windSpeed, and humidity), floats (for rainfall), objects (for text-based columns like name, symbol, and weatherDescription), and a single datetime column (date), which represents the date of the weather observations.

### 2.4 Descriptive Statistics

Finally, descriptive statistics is utilized to draw conclusions from the data presented in dataframe `20241028_163417.json`. To do that, the function `describe()` from pandas is used [[22]](#22):

In [15]:
# Descriptive Statistics

df.describe()

Unnamed: 0,temperature,windSpeed,windDirection,humidity,rainfall,pressure,date,reportTime
count,17.0,17.0,17.0,17.0,17.0,17.0,17,17
mean,13.647059,13.411765,259.411765,93.352941,0.165294,1016.823529,2024-10-28 00:00:00,2024-12-01 08:00:00
min,13.0,7.0,225.0,81.0,0.0,1015.0,2024-10-28 00:00:00,2024-12-01 00:00:00
25%,13.0,11.0,270.0,87.0,0.0,1015.0,2024-10-28 00:00:00,2024-12-01 04:00:00
50%,14.0,11.0,270.0,97.0,0.1,1017.0,2024-10-28 00:00:00,2024-12-01 08:00:00
75%,14.0,15.0,270.0,98.0,0.3,1018.0,2024-10-28 00:00:00,2024-12-01 12:00:00
max,15.0,22.0,270.0,99.0,0.7,1020.0,2024-10-28 00:00:00,2024-12-01 16:00:00
std,0.606339,4.500817,19.675679,6.274364,0.236593,1.810915,,


Based on the analysis of the output of `df.describe()` the following can be concluded:
- The weather in Athenry was marked by relatively stable conditions. The temperature averaged around 13.6°C, with only small fluctuations between 13°C and 15°C. 
- Humidity levels were consistently high, averaging 93%.
- Wind speeds were moderate, ranging from 7 km/h to 22 km/h, with winds primarily coming from the southwest [[23]](#23).
- Rainfall was light but consistent, with a mean of 0.17 mm, and the atmospheric pressure remained relatively stable at around 1016.8 hPa.   
   
These conditions suggest a steady, rainy day with mild temperatures and high humidity, typical of cool, wet weather [[24]](#24).

### 2.5 Dataset content

The dataset provides detailed hourly weather observations for Athenry, Co. Galway, covering key parameters such as temperature (in degrees Celsius), weather descriptions, wind speed (in km/h), wind direction (in degrees), humidity, rainfall (in mm), and atmospheric pressure (in hPa). The data is updated hourly and captures local time values for each observation. 
It is important to note that the data is not quality-controlled, and the geographic coverage is specific to Athenry. This dataset is part of Met Éireann's continuous weather monitoring and is freely available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, with the requirement to attribute Met Éireann as the data source [[25]](#25).

## 3. Conclusion



***
# End