# Computer Infrastructure - Tasks Report & Project
##### Author: E. Qejvani

This notebook holds the steps taken to complete the tasks in Computer Infrastructure module - ATU 2024
***

### Task 1:  Create Directory Structure.

<em>Using the command line, create a directory (that is, a folder) named data at the root of your repository. Inside data, create two subdirectories: timestamps and weather.</em>
***

##### Steps Taken to Complete the Task:

1. Logged into my GitHub account.
2. Started Codespaces on GitHub.
3. Created two directories within the data parent directory using the following commands:
    - `mkdir -p data/weather`
        (The -p flag ensures the data directory is created as a parent, with weather nested inside it).
    - `mkdir data/timestamps`
        (This command creates the timestamps directory within the data directory).

<img src="./img/task1.png" alt="Task 1">

##### Commands Used to Complete the Task:

- ```mkdir```: it stands for "make directory" and helps the user organize files by creating folders/directories. The ```mkdir``` command can create multiple directories in the same time and also can set permissions to this directories. There are different options you can use with this command. For this task I used ```-p``` option to create the directory as a parent directory, meaning that the directory created can hold other directories inside. More options with examples in the following document: [```mkdir``` command](https://www.geeksforgeeks.org/mkdir-command-in-linux-with-examples/). 

##### Notes on Completing the Task
* I referred to the Week 5 video lessons by Ian McLoughlin to guide me through the process.
* It made sure I was in the root of my account within Codespaces to create the directories in the correct location.
* I practiced creating and deleting the directories multiple times to familiarize myself with saving changes to my GitHub account.

***
### Task 2: Timestamps.

<em>Navigate to the data/timestamps directory. Use the date command to output the current date and time, appending the output to a file named now.txt. Make sure to use the >> operator to append (not overwrite) the file. Repeat this step ten times, then use the more command to verify that now.txt has the expected content.</em>
***
**Steps Taken to Complete the Task:**

1. Logged into my GitHub account.
2. Started Codespaces.
3. Navigated to the data/timestamps directory.
4. Practiced displaying the year using the following command: `date +"%Y"`
    * (This command outputs the current year. See task2.png for reference).
5. Practiced displaying the full date and time in the format YYYY/MM/DD HH:MM:SS using: `date +"%Y/%m/%d %H:%M:%S"`
6. Ran the same command again but redirected the output to a file named now.txt: `date +"%Y/%m/%d %H:%M:%S" > now.txt`
7. I used the >> operator to run the command 9 more times (10 total as required), appending each output to the now.txt file.
8. Next I used `the more now.txt` command to display the contents of the file to check the command was working as it should(See image attached).

<img src="./img/task2.png" alt="Task 2">

**Notes on Completing the Task:**
* The task did not specify a required date format, so I chose the format myself.

***
### Task 3: Formatting Timestamps.

<em>Run the date command again, but this time format the output using YYYYmmdd_HHMMSS (e.g., 20261114_130003 for 1:00:03 PM on November 14, 2026). Refer to the date man page (using man date) for more formatting options. (Press q to exit the man page). Append the formatted output to a file named formatted.txt.</em>
***

**Steps Taken to Complete the Task:**

1. Logged to my github account.
2. Started codespaces.
3. Navigated to data/timestamps directory.
4. First practiced to display the date using the command: date + "%Y%m%d_%H%M%s".
5. I run the command again creating and appending a file 'formatted.txt' and adding the output to that file.
6. Checked that the file existed and last time it was edited using the command 'date -r formatted.txt' (see the picture attached).

<img src="./img/task3.png" alt="Task 3">


***
### Task 4: Create Timestamped Files

<em>Use the touch command to create an empty file with a name in the YYYYmmdd_HHMMSS.txt format. You can achieve this by embedding your date command in backticks ` into the touch command. You should no longer use redirection (>>) in this step.</em>
***

**Steps Taken to Complete the Task:**

1. Logged to my github/codespaces account.
2. Navigated to the data/timestamps directory.
3. Run the command: touch 'date + "%Y%m%d_%H%M%s"'.txt
4. The touch command creates a new text file and the 'date + "%Y%m%d_%H%M%s"' sets the name of the file created to the current date & time.
5. Checked the content of the folder timestamps using the ls command if the file was created.


<img src="./img/task4.png" alt="Task 4" style="width: 700px">

**Notes on Completing the Task:**

* 'touch' command is a very useful Linux command, it can create one or multiple empty text files in the same time if needed. I tried to use the command with some other options like:
    - ```touch file1.txt file2.txt file3.txt``` - creating multiple files in the same time.
    - ```touch -c filename``` - prevent the creation of a new file if it does exist.
    - ```touch -a document.txt``` - update only the access time of document.txt.
    - ```touch -m file.txt``` - update only the modification time of the file.

Reference: https://www.serveracademy.com/blog/how-to-use-the-touch-command-in-linux/

***
### Task 5: Download Today's Weather Data

<em>Change to the data/weather directory. Download the latest weather data for the Athenry weather station from Met Eireann using wget. Use the -O <filename> option to save the file as weather.json. The data can be found at this URL: https://prodapi.metweb.ie/observations/athenry/today.</em>
***

**Steps Taken to Complete the Task:**

1. Logged into my GitHub/Codespaces account.  
2. Navigated to the `data/weather` directory.  
3. Executed the command: `wget https://prodapi.metweb.ie/observations/athenry/today`. While this command ran successfully, it did not save the data.  
4. Used the updated command: `wget -O weather.json https://prodapi.metweb.ie/observations/athenry/today`. This created a file named `weather.json` in the current working directory, storing the downloaded data.  
5. To view the contents of `weather.json` in a readable format, I used the command: `cat weather.json | jq`. This displayed the file contents in a well-formatted manner.  


<img src="./img/task5-1.png" alt="Task 5.1" style="width: 700px;"/>

**Picture 1:** Using `wget` command to download data

<img src="./img/task5-2.png" alt="Task 5.2" style="width: 700px">

**Picture 2:** Using `wget -O` option to create the file requested.

<img src="./img/task5-3.png" alt="Task 5.3" style="width: 700px">

**Picture 3:** Displaying/reading weather.json in a readable format.

**Notes on Completing the Task:**

The command-line utility `wget` in Linux/Unix is a tool used for downloading files from the internet. It supports protocols such as HTTP, HTTPS, and FTP. This versatile utility is especially useful because it can handle downloads reliably, even on slow or unstable networks.  

Key benefits of using the `wget` command include:  
- The ability to resume interrupted downloads.  
- Bandwidth limiting options, allowing better control over network resource usage.  
- Easy integration into scripts or automated workflows, making repetitive tasks more efficient.  

References:
- [wget coomand](https://phoenixnap.com/kb/wget-command-with-examples)
- [json with jq](https://shapeshed.com/jq-json/)

***
### Task 6: Timestamp the Data

<em>Modify the command from Task 5 to save the downloaded file with a timestamped name in the format YYYYmmdd_HHMMSS.json.</em>
***

**Steps Taken to Complete the Task:**

1. Logged into my GitHub account and accessed the repository.  
2. Launched Codespaces.  
3. Opened the `data/weather` directory.  
4. This task can be completed in two ways:  
   - Using the `wget` command (refer to the images below).  
   - Using the `curl` command (refer to the images below).   

<img src="./img/task6-2.png" alt="Task 6.2" style="width: 700px">

**Picture 1:** Using `wget` to download and save a file with a timestamped name.

<img src="./img/task6-3.png" alt="Task 6.3" style="width: 700px">

**Picture 1:** Using `curl` to download and save a file with a timestamped name.


**Notes on Completing the Task:**

In Task Five, I provided a brief overview of the `wget` command. Another highly versatile command for downloading data on Linux/Unix systems is the `curl` command. `Curl` is a free, open-source, command-line tool that allows users and developers to transfer data without a graphical interface.  

Unlike `wget`, `curl` offers greater flexibility in handling data transfer protocols and headers, making it ideal for complex HTTP requests and working with APIs. When it comes to interacting with web APIs or managing advanced network operations, `curl` is the preferred choice.  

Key advantages of the `curl` command:
- Facilitates data transfer to and from servers.  
- Provides detailed output with the `-v` option and supports tracing.  
- Offers extensive control over request headers.  


References:

- [Linux curl command](https://phoenixnap.com/kb/curl-command)
- [Difference Between wget VS curl](https://www.geeksforgeeks.org/difference-between-wget-vs-curl/)

***
### Task 7: Write the Script

<em>Write a bash script called weather.sh in the root of your repository. This script should automate the process from Task 6, saving the weather data to the data/weather directory. Make the script executable and test it by running it.</em>
***

**Steps Taken to Complete the Task:**

1. Logged into my GitHub account and accessed the repository.
2. Launched Codespaces.
3. Created the file and wrote the script.
4. Before executing the script, I made it executable using the command: `chmod u+x ./weather.sh`. This command changes the file permissions from 'rw' (read/write) to 'rwx' (read/write/execute).
5. Ran the script `./weather.sh` and confirmed its functionality by checking if the file was created and the data was saved in the `/data/weather` directory.
6. Executed the script several times, making small changes, such as displaying the date and time at the start and end of execution and showing messages when the download started and completed.

<img src="./img/task7-1.png" alt="Task 7.1" style="width: 700px">

**Picture 1:** Changing weather.sh file permission

<img src="./img/task7-3.png" alt="Task 7.3" style="width: 700px">

**Picture 2:** Checking that the execution of ./weather.sh works.


**Notes on Completing the Task:**

#### Explanation of `weather.sh` Script:

1. **`#! /bin/bash`**  
   - The shebang (`#!`) specifies the interpreter (`/bin/bash`) that will execute the script. It ensures the script is parsed using the Bash shell.

2. **`date`**  
   - Prints the current date and time to the terminal when the script starts executing. This helps track when the script begins.

3. **`echo "Downloading weather data."`**  
   - Displays a message to inform the user that the weather data download is starting. This improves user experience by providing feedback.

4. **``wget -O data/weather/`date +"%Y%m%d_%H%M%S.json"` https://prodapi.metweb.ie/observations/athenry/today``**  
   - Downloads weather data from the specified URL (`https://prodapi.metweb.ie/observations/athenry/today`).  
   - Saves the data to the `/data/weather` directory with a filename that includes a timestamp in the format `YYYYMMDD_HHMMSS.json`. The `wget` command fetches the data, and the `-O` flag specifies the output file's name and path.

5. **`echo "Weather data downloaded."`**  
   - Notifies the user that the weather data has been successfully downloaded.

6. **`date`**  
   - Prints the current date and time again, marking the script's completion. This provides clarity on when the script finishes execution.

References:

- [Data Camp - Bash Script Tutorial](https://www.datacamp.com/tutorial/how-to-write-bash-script-tutorial?dc_referrer=https%3A%2F%2Fwww.google.com%2F)
- [ChatGPT](https://chatgpt.com/)

***
#### Task 8: Notebook

<em>Create a notebook called weather.ipynb at the root of your repository. In this notebook, write a brief report explaining how you completed Tasks 1 to 7. Provide short descriptions of the commands used in each task and explain their role in completing the tasks.</em>

***
#### Task 9: pandas

<em>In your weather.ipynb notebook, use the pandas function read_json() to load in any one of the weather data files you have downloaded with your script. Examine and summarize the data. Use the information provided data.gov.ie to write a short explanation of what the data set contains.</em>

##### About the dataset:

The dataset holds hourly weather observations for the current day in Athenry, Co. Galway, provided by Met Éireann. It includes data such as temperature, weather description, wind speed and direction, humidity, rainfall, pressure, and observation time. The dataset is updated continuously in local time, is not quality-controlled, and is available for download in .json and .csv formats. 

Additional details:
- Language: English  
- Geographic coverage: Athenry, with coordinates (-8.8, 53.3) in GeoJSON format  
- Spatial Reference Systems (SRS): WGS 84 (EPSG:4326)  
- Time period: Today  
- High Value Dataset (HVD): Yes, in the Meteorological category
- More information: [Today's Weather Athenry](https://data.gov.ie/dataset/todays-weather-athenry)
***

#### Step 1: Importing the libraries needed for completing the task.

In [1]:
# Importing libraries.
import pandas as pd 
import os
import glob

##### Reading a .json file

For this task I will imported glob and os libraries to use them to read the latest json file downloaded in the /data/weather directory.

In [2]:
# Filepath of the directory containing the JSON files
directory = 'data/weather'

# Find all JSON files in the directory, excluding 'weather' file
json_files = [f for f in glob.glob(os.path.join(directory, '*.json')) if 'weather' not in os.path.basename(f)]

# Check if there are any JSON files in the directory
if json_files:
    # Extract the latest file based on the filename (assumes YYYYmmdd_HHMMSS format)
    latest_file = max(json_files, key=lambda x: os.path.basename(x).split('.')[0])
    
    # Read the contents of the latest JSON file into a Pandas DataFrame
    df = pd.read_json(latest_file)
    
    # Print the name of the latest file for verification
    print(f"Latest file read: {latest_file}")
else:
    # If no JSON files are found, print an appropriate message
    print("No JSON files found in the directory.")

# checking that the dataset is loaded to our frame.
df.head(5)

Latest file read: data/weather\20241217_201319.json


Unnamed: 0,name,temperature,symbol,weatherDescription,text,windSpeed,windGust,cardinalWindDirection,windDirection,humidity,rainfall,pressure,dayName,date,reportTime
0,Athenry,7,04n,Cloudy,"""Cloudy""",6,-,SE,135,94,0.0,1023,Tuesday,2024-12-17,00:00
1,Athenry,8,04n,Cloudy,"""Cloudy""",4,-,SE,135,93,0.0,1022,Tuesday,2024-12-17,01:00
2,Athenry,10,04n,Cloudy,"""Cloudy""",13,-,S,180,89,0.0,1021,Tuesday,2024-12-17,02:00
3,Athenry,10,04n,Cloudy,"""Cloudy""",15,-,SE,135,87,0.0,1019,Tuesday,2024-12-17,03:00
4,Athenry,11,05n,Rain showers,"""Rain shower""",19,-,S,180,86,0.01,1018,Tuesday,2024-12-17,04:00


Let's begin by exploring the dataset and examining the column headers to understand the types of recordings contained within.

In [3]:
# Getting information about our DataFrame: number of entries, column names, non-null counts, data types for each column and memory usage.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   name                   20 non-null     object        
 1   temperature            20 non-null     int64         
 2   symbol                 20 non-null     object        
 3   weatherDescription     20 non-null     object        
 4   text                   20 non-null     object        
 5   windSpeed              20 non-null     int64         
 6   windGust               20 non-null     object        
 7   cardinalWindDirection  20 non-null     object        
 8   windDirection          20 non-null     int64         
 9   humidity               20 non-null     int64         
 10  rainfall               20 non-null     float64       
 11  pressure               20 non-null     int64         
 12  dayName                20 non-null     object        
 13  date   

In [4]:
df.describe()

Unnamed: 0,temperature,windSpeed,windDirection,humidity,rainfall,pressure,date
count,20.0,20.0,20.0,20.0,20.0,20.0,20
mean,11.15,18.35,166.5,87.55,0.053,1010.0,2024-12-17 00:00:00
min,7.0,4.0,135.0,81.0,0.0,996.0,2024-12-17 00:00:00
25%,11.0,17.0,135.0,86.0,0.0,1002.5,2024-12-17 00:00:00
50%,11.0,19.0,180.0,87.0,0.01,1011.5,2024-12-17 00:00:00
75%,12.0,22.0,180.0,89.0,0.0325,1017.25,2024-12-17 00:00:00
max,13.0,26.0,180.0,95.0,0.5,1023.0,2024-12-17 00:00:00
std,1.565248,5.499043,21.157306,3.425523,0.11797,8.885233,


In [16]:
print(df['windGust'].unique())
print(df['cardinalWindDirection'].unique())
print(df['text'].unique())
print(df['cardinalWindDirection'].unique())
print(df['dayName'].unique())

['-' '41' '39' '43' '48' '44' '46']
['SE' 'S']
['"Cloudy"' '"Rain shower"' '"Light rain "' '"Recent Rain"'
 '"Moderate rain "' '"Light Drizzle "' '"Recent Drizzle "']
['SE' 'S']
['Tuesday']


***
### Notes on the project.

- While working on Task 2, I encountered an issue where the images wouldn't display on GitHub. To resolve this, I had to switch the repository's visibility from private to public.

### References:

- Creating files using touch command: https://www.geeksforgeeks.org/touch-command-in-linux-with-examples/
- Displaying a .json file in terminal(bash): https://linuxopsys.com/read-json-file-in-shell-script
- Working with images - resizing: https://stackoverflow.com/questions/41598916/resize-the-image-in-jupyter-notebook-using-markdown
