# Programming for Data Analysis Assignment 1 Notebook

Author - Sean Humphreys

---

## Contents

1. [Introduction](#introduction)
2. [Definitions](#definitions)
3. [Dataset Exploration](#dataset-exploration)
4. [Dataset Variables](#dataset-variables)
4. [References](#references)
5. [Associated Reading](#associated-reading)

---

## Introduction <a id="introduction"></a>

The dataset explored and synthesised in this notebook is from a weather sensor in the author's back garden. The data extract is in the form of a csv. The source data csv can be accessed [here](datasets/back_garden_sensor_data_12_months.csv). The extract contains local weather data captured over a 12 month period.

---

## Definitions <a id="definitions"></a>

- [Pandas](https://pandas.pydata.org/) (https://pandas.pydata.org/ - last accessed 03 Nov. 2023) is an open source software library used in data analytics that allows data analysis and manipulation. Pandas is built on top of the *Python* programming language. A Pandas DataFrame is a dictionary like container for Series objects. A DataFrame is the primary Pandas data structure.

---

## Dataset Exploration <a id="dataset-exploration"></a>

Import the Pandas software library. Pandas can be used to make datasets clean and process datasets.

In [10]:
# import the required python libraries
import pandas as pd

Read in the source data CSV file, rename the columns to python friendly strings and inspect the Pandas DataFrame.

In [11]:
# python friendly columns list
column_rename = ['date_time', 'wind_speed_m_s', 'gust_m_s', 'humidity_%', 'chill_celsius', 'chill_minimum_celsius', 
                 'temperature_average_celsius', 'temperature_range_low_celsius', 'temperature_range_high_celsius']

# use pandas to read in the dataset, rename the headers to python friendly strings
garden_weather = pd.read_csv('datasets/back_garden_sensor_data_12_months.csv', header=0, names=column_rename)

garden_weather.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 359 entries, 0 to 358
Data columns (total 9 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   date_time                       359 non-null    object 
 1   wind_speed_m_s                  359 non-null    float64
 2   gust_m_s                        359 non-null    float64
 3   humidity_%                      359 non-null    float64
 4   chill_celsius                   359 non-null    float64
 5   chill_minimum_celsius           359 non-null    float64
 6   temperature_average_celsius     359 non-null    float64
 7   temperature_range_low_celsius   359 non-null    float64
 8   temperature_range_high_celsius  359 non-null    float64
dtypes: float64(8), object(1)
memory usage: 25.4+ KB


Visually inspect the top and bottom of the dataframe to ascertain if there is any data entries that may cause problems.

In [12]:
garden_weather.head()

Unnamed: 0,date_time,wind_speed_m_s,gust_m_s,humidity_%,chill_celsius,chill_minimum_celsius,temperature_average_celsius,temperature_range_low_celsius,temperature_range_high_celsius
0,09/11/2022 00:00,3.3,6.7,84.0,12.9,11.8,12.74,12.4,13.0
1,10/11/2022 00:00,3.2,11.5,81.0,16.9,12.1,14.59,12.3,16.9
2,11/11/2022 00:00,3.1,12.5,82.0,16.9,14.4,15.4,14.4,16.9
3,12/11/2022 00:00,0.6,2.3,87.0,13.6,12.6,13.27,12.6,13.6
4,13/11/2022 00:00,0.8,5.5,87.0,14.8,12.3,13.95,12.3,14.8


In [13]:
garden_weather.tail()

Unnamed: 0,date_time,wind_speed_m_s,gust_m_s,humidity_%,chill_celsius,chill_minimum_celsius,temperature_average_celsius,temperature_range_low_celsius,temperature_range_high_celsius
354,30/10/2023 00:00,0.0,0.6,93.0,13.9,4.7,8.94,4.7,13.9
355,31/10/2023 00:00,0.6,2.5,93.0,15.9,5.5,9.91,5.5,15.9
356,01/11/2023 00:00,2.5,7.6,90.0,12.8,3.0,8.79,3.0,12.9
357,02/11/2023 00:00,1.9,5.6,88.0,12.4,4.9,7.64,5.0,12.4
358,03/11/2023 00:00,2.0,8.3,89.601351,12.8,4.6,8.248322,5.3,12.8


Inspect the dataset for null values

In [14]:
garden_weather.isnull().sum()

date_time                         0
wind_speed_m_s                    0
gust_m_s                          0
humidity_%                        0
chill_celsius                     0
chill_minimum_celsius             0
temperature_average_celsius       0
temperature_range_low_celsius     0
temperature_range_high_celsius    0
dtype: int64

No null values are demonstrated.

Inspect the dataset for any duplicate entries.

In [15]:
duplicate_rows = garden_weather[garden_weather.duplicated()]

duplicate_rows

Unnamed: 0,date_time,wind_speed_m_s,gust_m_s,humidity_%,chill_celsius,chill_minimum_celsius,temperature_average_celsius,temperature_range_low_celsius,temperature_range_high_celsius


No duplicate rows are demonstrated.

In [16]:
garden_weather.describe()

Unnamed: 0,wind_speed_m_s,gust_m_s,humidity_%,chill_celsius,chill_minimum_celsius,temperature_average_celsius,temperature_range_low_celsius,temperature_range_high_celsius
count,359.0,359.0,359.0,359.0,359.0,359.0,359.0,359.0
mean,1.698329,5.204735,81.16602,17.349025,7.145404,11.561806,7.41922,17.40585
std,1.105596,2.961325,5.671424,6.558507,4.898555,5.100804,4.684545,6.486319
min,0.0,0.0,60.0,3.4,-5.8,-1.75,-5.8,3.4
25%,0.8,3.0,78.0,11.8,3.65,7.72,4.1,11.9
50%,1.5,4.6,81.0,17.4,7.6,11.65,7.7,17.4
75%,2.3,6.7,85.0,23.05,11.2,15.46,11.2,23.05
max,5.5,15.6,94.0,32.8,17.8,22.51,17.8,32.8


## Dataset Variables <a id="dataset-variables"></a>

## References <a id="references"></a>

---

## Associated Reading <a id="associated-reading"></a>

Pandas (2018). Python Data Analysis Library — pandas: Python Data Analysis Library. [online] Pydata.org. Available at: https://pandas.pydata.org/. [Accessed 03 Nov. 2023].

pandas.pydata.org. (n.d.). API reference — pandas 1.1.4 documentation. [online] Available at: https://pandas.pydata.org/docs/reference/index.html. [Accessed 03 Nov. 2023].

Tamboli, N. (2021). Tackling Missing Value in Dataset. [online] Analytics Vidhya. Available at: https://www.analyticsvidhya.com/blog/2021/10/handling-missing-value/. [Accessed 18 Oct. 2023].

Zach (2021). How to Find Duplicates in Pandas DataFrame (With Examples). [online] Statology. Available at: https://www.statology.org/pandas-find-duplicates/. [Accessed 3 Nov. 2023].

---

*Notebook Ends*