## 1. Mapper Script for Wind Speed Difference, Minimum Relative Humidity, and Dew Point Temp Analysis

## 1.1 Uplading the dataset in HDFS

First I will create a new directory called weather on HDFS using mkdir and then navigate to it using the cd command. E.g.:

In [None]:
[xxx@dsm1 ~]$ mkdir weather
[xxx@dsm1 ~]$ cd weather

In [None]:
# Next I will copy the data from local disk to the cluster
scp "/Users/jerid/Desktop/200707hourly.txt" xxx@dsm1.doc.gold.ac.uk:~/weather/

#To verify it is on the cluster I will use the ls command

[xxx@dsm1 weather]$ ls

200707hourly.txt 
    
# I can see it exists and now I will copy the file onto HDFS
hadoop fs -copyFromLocal weather/200707hourly.txt

#Verifying that its on HDFS

[xxx@dsm1 ~]$ hadoop fs -ls /user/xxx/
Found 20 items
-rw-r--r--   3 xxx hadoop  101449102 2024-03-07 10:32 /user/xxx/200707hourl

## 1.2 Testing the mapper on a smaller file in jupyter notebook

Firstly, I will create smaller file of 100 lines. I have commented it out since I have already ran it. I created it for testing purposes as it will make the testing process quicker and help with debugging issues. Once I am confident with the scripts perfomance in python, I will then run this on the full dataset in hadoop.

In [None]:
#num_lines = 100

#with open('200707hourly.txt', 'r') as infile, open('200707hourly_tiny.txt', 'w') as outfile:
    #for i, line in enumerate(tpinfile):
        #if i >= num_lines:
            #break
        #outfile.write(line)

First, I will create the mapper on smaller file in python and test the code. 

In [13]:
# Opening the file for reading and the mapper_output file for writing

with open("200707hourly_tiny.txt", "r") as file, open("mapper_output.txt", "w") as outfile:
    for line in file:
        # I will split the line into columns
        
        columns = line.strip().split(',')
        
        try:
            # Next I will extract the necessary fields
            
            date = columns[1]
            wind_speed = float(columns[12])  # python reads first column as 0 so wind speed is at index 12
            relative_humidity = float(columns[11].strip('%'))  # I need to remove the '%' in relative humidity 
            dew_point_temp = float(columns[9])  
            
            # I will need to emit data for wind speed analysis.
            outfile.write(f"{date}_WS\t{wind_speed}\n") 
            
            # I will also emit data for relative humidity analysis 
            outfile.write(f"{date}_RH\t{relative_humidity}\n")
            
            # Finally, I will emit data for dew point temperature analysis 
            outfile.write(f"{date}_DPT\t{dew_point_temp}\n")
            
        except ValueError:
            
            continue


I will now open and read the first few lines of the file

In [27]:
#Results

with open("mapper_output.txt", "r") as file:
    for i, line in enumerate(file):
        print(line.strip())
        if i >= 4:  
            break


20070701_WS	3.0
20070701_RH	22.0
20070701_DPT	18.0
20070701_WS	3.0
20070701_RH	23.0


Now that it works smoothly, I will create the mapper.py file within the cluster.

In [None]:
# Creating the mapper file using touch command
touch mapper.py

# Opening the mapper file with nano and will then paste the code below and save
nano mapper.py

Within my nano mapper file created I will copy the code below as Hadoop, mappers and reducers communicate with stdin and stdout and save it:

In [None]:
#!/usr/bin/env python
import sys

for line in sys.stdin:
    
    columns = line.strip().split(',')
    
    try:
        
        date = columns[1]
        wind_speed = float(columns[12]) 
        relative_humidity = float(columns[11].strip('%')) 
        dew_point_temp = float(columns[9])  
        
       
        print(f"{date}_WS\t{wind_speed}")
        
        
        print(f"{date}_RH\t{relative_humidity}")
        
        print(f"{date}_DPT\t{dew_point_temp}")
        
    except ValueError:
        
        continue


## 2. Reducer Script for Maximum and Minimum Wind Speed Difference

Now the mapper is created, I will derive the min, max etc in the reducers below.

## 2.1 Calculating Wind Speed Difference

Testing on small file in python:

In [15]:
file = open("mapper_output.txt", "r")

current_date = None
max_wind_speed = float('-inf')
min_wind_speed = float('inf')

for line in file.readlines():
    date_key, value = line.strip().split('\t')
    value = float(value)

    if date_key.endswith("_WS"):  # I will only process wind speed data
        date = date_key.split('_')[0]

        if date == current_date:
            max_wind_speed = max(max_wind_speed, value)
            min_wind_speed = min(min_wind_speed, value)
        else:
            if current_date:
                # I will output the difference for the previous date
                print(f"{current_date}\t{max_wind_speed - min_wind_speed}")
            current_date = date
            max_wind_speed = min_wind_speed = value

# I will output the last date after the loop
if current_date:
    print(f"{current_date}\t{max_wind_speed - min_wind_speed}")


20070701	9.0
20070702	7.0
20070703	7.0
20070704	11.0
20070705	0.0


Since it is successful I will create the py file within the cluster and copy in the code below:

In [None]:
touch reducer_max_min.py 
nano reducer_max_min.py

In [None]:
#!/usr/bin/env python
import sys

current_date = None
max_wind_speed = float('-inf')
min_wind_speed = float('inf')

for line in sys.stdin:
    date_key, value = line.strip().split('\t')
    value = float(value)
    
    if date_key.endswith("_WS"):  
        date = date_key.split('_')[0]
        
        if date == current_date:
            max_wind_speed = max(max_wind_speed, value)
            min_wind_speed = min(min_wind_speed, value)
        else:
            if current_date:
                
                print(f"{current_date}\t{max_wind_speed - min_wind_speed}")
            
            current_date = date
            max_wind_speed = min_wind_speed = value


if current_date:
    print(f"{current_date}\t{max_wind_speed - min_wind_speed}")

In [None]:
# Run the mapper and reducer on the hadoop cluster

[jdawi001@dsm1 weather]$ hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar \
    -files mapper.py,reducer_max_min.py \
    -mapper "python3 mapper.py" \
    -reducer "python3 reducer_max_min.py" \
    -input /user/jdawi001/200707hourly.txt -output /user/jdawi001/200707_windspeed_difference


# Copy the output from HDFS onto the headnode

hadoop fs -copyToLocal /user/jdawi001/200707_windspeed_difference

# Download the output from the headnode onto local machine

scp -r jdawi001@dsm1.doc.gold.ac.uk:~/weather/200707_windspeed_difference /Users/jerid/Downloads/

In [None]:
Results of Wind Speed Difference: 

In [34]:
file = open('weather_final_results/200707_windspeed_difference/part-00000', 'r', encoding='UTF-8')

for line in file.readlines():
    print(line.strip())

20070701	31.0
20070702	80.0
20070703	37.0
20070704	29.0
20070705	37.0
20070706	64.0
20070707	51.0
20070708	37.0
20070709	36.0
20070710	82.0
20070711	37.0
20070712	32.0
20070713	31.0
20070714	30.0
20070715	36.0
20070716	35.0
20070717	35.0
20070718	39.0
20070719	12.0


## 2.2 Calculating Minimum relative humidity

Testing reducer on a smaller file within Jupyter:

In [18]:
file = open("mapper_output.txt", "r")

current_date = None
min_relative_humidity = float('inf')

for line in file.readlines():
    # I will split the line by tab to get the date key and the relative humidity value
    date_key, value = line.strip().split('\t')
    value = float(value)  # I also need to convert the string to float to compare
    
    if date_key.endswith("_RH"):  # I will use RH as I am processing only relative humidity data
        date = date_key.split('_')[0]
        
        if date == current_date:
            # I will update min if the current value is less than the current minimum
            min_relative_humidity = min(min_relative_humidity, value)
        else:
            # If it moves to a new date, I will output the minimum for the previous date, if not the first line.
            if current_date is not None:
                print(f"{current_date}\t{min_relative_humidity}")
            
            # I will now reset for the new date
            current_date = date
            min_relative_humidity = value

# Now I must output the last date after finishing all the lines
if current_date:
    print(f"{current_date}\t{min_relative_humidity}")

20070701	3.0
20070702	3.0
20070703	3.0
20070704	5.0
20070705	36.0


I will now create the py file within the cluster and copy in the code below:

In [None]:
touch reducer_min.py
nano reducer_min.py

In [None]:
#!/usr/bin/env python
import sys

current_date = None
min_relative_humidity = float('inf')

for line in sys.stdin:
   
    date_key, value = line.strip().split('\t')
    value = float(value) 
    
    if date_key.endswith("_RH"):  
        date = date_key.split('_')[0]
        
        if date == current_date:
            
            min_relative_humidity = min(min_relative_humidity, value)
        else:
            
            if current_date is not None:
                print(f"{current_date}\t{min_relative_humidity}")
            
            
            current_date = date
            min_relative_humidity = value


if current_date:
    print(f"{current_date}\t{min_relative_humidity}")


In [None]:
# Run the mapper and reducer on the hadoop cluster

[xxx@dsm1 weather]$ hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar \
    -files mapper.py,reducer_min.py \
    -mapper "python3 mapper.py" \
    -reducer "python3 reducer_min.py" \
    -input /user/xxx/200707hourly.txt -output /user/xxx/200707_min_humidity


# Copy the output from HDFS onto the headnode

hadoop fs -copyToLocal /user/xxx/200707_min_humidity

# Download the output from the headnode onto local machine

scp -r xxx@dsm1.doc.gold.ac.uk:~/weather/200707_min_humidity /Users/jerid/Downloads/

In [None]:
Results:

In [35]:
file = open('weather_final_results/200707_min_humidity/part-00000', 'r', encoding='UTF-8')

for line in file.readlines():
    print(line.strip())

20070701	2.0
20070702	2.0
20070703	3.0
20070704	3.0
20070705	3.0
20070706	3.0
20070707	3.0
20070708	3.0
20070709	2.0
20070710	4.0
20070711	2.0
20070712	4.0
20070713	3.0
20070714	5.0
20070715	6.0
20070716	6.0
20070717	4.0
20070718	4.0
20070719	63.0


## 2.3 Calculating daily mean Dew Point Temp

Testing the reducer on a smaller file in jupyter notebook:

In [24]:
file = open("mapper_output.txt", "r")

current_date = None
total_dpt = 0
count_dpt = 0

for line in file.readlines():
    date_key, value = line.strip().split('\t')
    value = float(value)
    
    if date_key.endswith("_DPT"):  # I will only process dew point temp data
        date = date_key.split('_')[0]
        
        if date == current_date:
            total_dpt += value
            count_dpt += 1
        else:
            if current_date:
                # I will now output the mean for the previous date
                print(f"{current_date}\t{total_dpt / count_dpt}")
            
            current_date = date
            total_dpt = value
            count_dpt = 1

if current_date:
    print(f"{current_date}\t{total_dpt / count_dpt}")

20070701	6.3478260869565215
20070702	10.8
20070703	8.88888888888889
20070704	21.375
20070705	36.0


I will now create the py file within the cluster and copy in the code below:

In [None]:
touch reducer_mean.py
nano reducer_mean.py

In [None]:
#!/usr/bin/env python
import sys

current_date = None
total_dpt = 0
count_dpt = 0

for line in sys.stdin:
    date_key, value = line.strip().split('\t')
    value = float(value)
    
    if date_key.endswith("_DPT"):  
        date = date_key.split('_')[0]
        
        if date == current_date:
            total_dpt += value
            count_dpt += 1
        else:
            if current_date:
                
                print(f"{current_date}\t{total_dpt / count_dpt}")
            
            current_date = date
            total_dpt = value
            count_dpt = 1

# Now I now forget to output the last date after finishing all the lines
if current_date:
    print(f"{current_date}\t{total_dpt / count_dpt}")


In [None]:
# Run the mapper and reducer on the hadoop cluster

[xxx@dsm1 weather]$ hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar \
    -files mapper.py,reducer_mean.py \
    -mapper "python3 mapper.py" \
    -reducer "python3 reducer_mean.py" \
    -input /user/xxx/200707hourly.txt -output /user/xxx/200707_mean_dew_point


# Copy the output from HDFS onto the headnode

hadoop fs -copyToLocal /user/xxx/200707_mean_dew_point

# Download the output from the headnode onto local machine

scp -r xxx@dsm1.doc.gold.ac.uk:~/weather/200707_mean_dew_point /Users/jerid/Downloads/

In [None]:
Results:

In [36]:
file = open('weather_final_results/200707_mean_dew_point/part-00000', 'r', encoding='UTF-8')

for line in file.readlines():
    print(line.strip())

20070701	54.41121880800165
20070702	54.871995298266235
20070703	57.25788551401869
20070704	59.58217147553494
20070705	60.22808453421414
20070706	59.42598134131315
20070707	59.81553625223012
20070708	61.24157568604308
20070709	61.533953159938804
20070710	61.20380650277558
20070711	59.09537080497536
20070712	56.90508613858481
20070713	56.98782019452639
20070714	58.23986664327075
20070715	59.077835910326485
20070716	59.63978101121293
20070717	61.56257216164208
20070718	63.56921766292773
20070719	74.78666666666666


## 2.4 Calculating daily variance of Dew Point Temp

Testing the reducer on a smaller file in jupyter notebook:

In [25]:
file = open("mapper_output.txt", "r")

# Creating the function to calculate the variance
def calculate_variance(values):
    if len(values) == 0:
        return None
    mean = sum(values) / len(values)
    variance = sum((x - mean) ** 2 for x in values) / len(values)
    return variance

current_date = None
temps = []

for line in file.readlines():
    date_key, value = line.strip().split('\t')
    if "_DPT" in date_key:
        date = date_key.split('_')[0]
        value = float(value)
        
        if current_date == date:
            temps.append(value)
        else:
            if current_date is not None:
               
            # Calculating and printing the variance for the previous date
                variance = calculate_variance(temps)
                print(f"{current_date}\t{variance}")
            current_date = date
            temps = [value]

# Now I will output the last date
if current_date is not None:
    variance = calculate_variance(temps)
    print(f"{current_date}\t{variance}")

20070701	104.83553875236292
20070702	55.36
20070703	42.65432098765432
20070704	69.06770833333333
20070705	0.0


I will now create the py file within the cluster and copy in the code below:

In [None]:
touch reducer_var.py
nano reducer_var.py

In [None]:
#!/usr/bin/env python
import sys

def calculate_variance(values):
    if len(values) == 0:
        return None
    mean = sum(values) / len(values)
    variance = sum((x - mean) ** 2 for x in values) / len(values)
    return variance

current_date = None
temps = []

for line in sys.stdin:
    date_key, value = line.strip().split('\t')
    if "_DPT" in date_key:
        date = date_key.split('_')[0]
        value = float(value)
        
        if current_date == date:
            temps.append(value)
        else:
            if current_date is not None:
                
                variance = calculate_variance(temps)
                print(f"{current_date}\t{variance}")
            current_date = date
            temps = [value]


if current_date is not None:
    variance = calculate_variance(temps)
    print(f"{current_date}\t{variance}")


In [None]:
# Run the mapper and reducer on the hadoop cluster

[xxx@dsm1 weather]$ hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar \
    -files mapper.py,reducer_var.py \
    -mapper "python3 mapper.py" \
    -reducer "python3 reducer_var.py" \
    -input /user/xxx/200707hourly.txt -output /user/xxx/200707_var_dew_point


# Copy the output from HDFS onto the headnode

hadoop fs -copyToLocal /user/jdawi001/200707_var_dew_point

# Download the output from the headnode onto local machine

scp -r xxx@dsm1.doc.gold.ac.uk:~/weather/200707_var_dew_point /Users/jerid/Downloads/

In [None]:
Results:

In [37]:
file = open('weather_final_results/200707_var_dew_point/part-00000', 'r', encoding='UTF-8')

for line in file.readlines():
    print(line.strip())

20070701	179.3196301999967
20070702	172.6328654575169
20070703	157.3550487999709
20070704	123.4225017157641
20070705	108.4531073742198
20070706	105.31909731045707
20070707	116.19401559309561
20070708	121.7921870716496
20070709	132.6581681197474
20070710	137.68216837540814
20070711	125.50425221523547
20070712	104.96666124415624
20070713	102.27951984757348
20070714	92.49086429757877
20070715	98.7039933057828
20070716	103.3323730293108
20070717	93.91535990097952
20070718	87.63282782753336
20070719	2.0078222222222233


## 3. Correlation matrix that describes the monthly correlation

I will create the mapper to include the dry bulb temp on smaller file in python and test it:

In [21]:
# Opening the file for reading and the mapper_output file for writing
with open("200707hourly_tiny.txt", "r") as file, open("mapper_output_incl_dry_bulb.txt", "w") as outfile:
    for line in file:
        # I will split the line into columns
        columns = line.strip().split(',')
        
        try:
            # I will now extract necessary fields
            date = columns[1]
            wind_speed = float(columns[12])  
            relative_humidity = float(columns[11].strip('%'))  
            dew_point_temp = float(columns[9])  
            dry_bulb_temp = float(columns[8])
            
            # I will emit data for wind speed analysis, relative humidity, etc 
            outfile.write(f"{date}_WS\t{wind_speed}\n")
            
            
            outfile.write(f"{date}_RH\t{relative_humidity}\n")
            
            
            outfile.write(f"{date}_DPT\t{dew_point_temp}\n")
            
            
            outfile.write(f"{date}_DBT\t{dry_bulb_temp}\n")
            
        except ValueError:
           
            continue
        

In [38]:
#Results

with open("mapper_output_incl_dry_bulb.txt", "r") as file:
    for i, line in enumerate(file):
        print(line.strip())
        if i >= 4:  
            break

20070701_WS	3.0
20070701_RH	22.0
20070701_DPT	18.0
20070701_DBT	57.0
20070701_WS	3.0


Since the above works I will proceed to creating the mapper in cluster

In [None]:
# Creating the mapper file using touch command
touch mapper2.py

# Opening the mapper file with nano and I will paste the code below and save
nano mapper2.py

Within my nano mapper file created I will copy the code below as Hadoop, mappers and reducers communicate with stdin and stdout and save it:

In [None]:
#!/usr/bin/env python
import sys

for line in sys.stdin:
        columns = line.strip().split(',')
        
        try:
           
            date = columns[1]
            wind_speed = float(columns[12])  
            relative_humidity = float(columns[11].strip('%'))  
            dew_point_temp = float(columns[9])  
            dry_bulb_temp = float(columns[8])
            
            
            print(f"{date}_WS\t{wind_speed}\n")
            
            
            print(f"{date}_RH\t{relative_humidity}\n")
            
            
            print(f"{date}_DPT\t{dew_point_temp}\n")
            
            
            print(f"{date}_DBT\t{dry_bulb_temp}\n")
            
        except ValueError:
            
            continue

## Reducer for Correlation Matrix Calculation

testing reducer in jupyter notebook :

In [22]:
file = open("mapper_output_incl_dry_bulb.txt", "r")

def mean(data):
    return sum(data) / len(data)

def variance(data, mean):
    return sum((x - mean) ** 2 for x in data) / len(data)

def covariance(data_x, mean_x, data_y, mean_y):
    return sum((x - mean_x) * (y - mean_y) for x, y in zip(data_x, data_y)) / len(data_x)

data_ws = []
data_rh = []
data_dpt = []
data_dbt = []


for line in file:
    key, value = line.strip().split('\t')
    value = float(value)
    
    if key.endswith("_WS"):
        data_ws.append(value)
    elif key.endswith("_RH"):
        data_rh.append(value)
    elif key.endswith("_DPT"):
        data_dpt.append(value)
    elif key.endswith("_DBT"):
        data_dbt.append(value)
        
        
mean_ws = mean(data_ws)
mean_rh = mean(data_rh)
mean_dpt = mean(data_dpt)
mean_dbt = mean(data_dbt)


def manual_variance(data, mean):
    return sum((x - mean) ** 2 for x in data) / len(data)

variance_ws = manual_variance(data_ws, mean_ws)
variance_rh = manual_variance(data_rh, mean_rh)
variance_dpt = manual_variance(data_dpt, mean_dpt)
variance_dbt = manual_variance(data_dbt, mean_dbt)


def manual_covariance(data_x, mean_x, data_y, mean_y):
    return sum((x - mean_x) * (y - mean_y) for x, y in zip(data_x, data_y)) / len(data_x)

covariance_ws_rh = manual_covariance(data_ws, mean_ws, data_rh, mean_rh)
covariance_ws_dbt = manual_covariance(data_ws, mean_ws, data_dbt, mean_dbt)
covariance_rh_dbt = manual_covariance(data_rh, mean_rh, data_dbt, mean_dbt)


correlation_ws_rh = covariance_ws_rh / (variance_ws ** 0.5 * variance_rh ** 0.5)
correlation_ws_dbt = covariance_ws_dbt / (variance_ws ** 0.5 * variance_dbt ** 0.5)
correlation_rh_dbt = covariance_rh_dbt / (variance_rh ** 0.5 * variance_dbt ** 0.5)


print(f"WS and RH Correlation: {correlation_ws_rh}")
print(f"WS and DBT Correlation: {correlation_ws_dbt}")
print(f"RH and DBT Correlation: {correlation_rh_dbt}")


WS and RH Correlation: -0.09107301779923976
WS and DBT Correlation: -0.07858355336150848
RH and DBT Correlation: -0.6505568727257489


I will now create the py file within the cluster and copy in the code below:

In [None]:
touch reducer_corr.py
nano reducer_corr.py

In [None]:
#!/usr/bin/env python
import sys

def mean(data):
    return sum(data) / len(data)

def variance(data, mean):
    return sum((x - mean) ** 2 for x in data) / len(data)

def covariance(data_x, mean_x, data_y, mean_y):
    return sum((x - mean_x) * (y - mean_y) for x, y in zip(data_x, data_y)) / len(data_x)

data_ws = []
data_rh = []
data_dpt = []
data_dbt = []


for line in sys.stdin:
    key, value = line.strip().split('\t')
    value = float(value)
    
    if key.endswith("_WS"):
        data_ws.append(value)
    elif key.endswith("_RH"):
        data_rh.append(value)
    elif key.endswith("_DPT"):
        data_dpt.append(value)
    elif key.endswith("_DBT"):
        data_dbt.append(value)
        
        
mean_ws = mean(data_ws)
mean_rh = mean(data_rh)
mean_dpt = mean(data_dpt)
mean_dbt = mean(data_dbt)


def manual_variance(data, mean):
    return sum((x - mean) ** 2 for x in data) / len(data)

variance_ws = manual_variance(data_ws, mean_ws)
variance_rh = manual_variance(data_rh, mean_rh)
variance_dpt = manual_variance(data_dpt, mean_dpt)
variance_dbt = manual_variance(data_dbt, mean_dbt)


def manual_covariance(data_x, mean_x, data_y, mean_y):
    return sum((x - mean_x) * (y - mean_y) for x, y in zip(data_x, data_y)) / len(data_x)

covariance_ws_rh = manual_covariance(data_ws, mean_ws, data_rh, mean_rh)
covariance_ws_dbt = manual_covariance(data_ws, mean_ws, data_dbt, mean_dbt)
covariance_rh_dbt = manual_covariance(data_rh, mean_rh, data_dbt, mean_dbt)


correlation_ws_rh = covariance_ws_rh / (variance_ws ** 0.5 * variance_rh ** 0.5)
correlation_ws_dbt = covariance_ws_dbt / (variance_ws ** 0.5 * variance_dbt ** 0.5)
correlation_rh_dbt = covariance_rh_dbt / (variance_rh ** 0.5 * variance_dbt ** 0.5)


print(f"WS and RH Correlation: {correlation_ws_rh}")
print(f"WS and DBT Correlation: {correlation_ws_dbt}")
print(f"RH and DBT Correlation: {correlation_rh_dbt}")


In [None]:
# Run the mapper and reducer on the hadoop cluster

[xxx@dsm1 weather]$ hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar \
    -files mapper2.py,reducer_corr.py \
    -mapper "python3 mapper2.py" \
    -reducer "python3 reducer_corr.py" \
    -input /user/jdawi001/200707hourly.txt -output /user/xxx/200707_corr_matrix


# Copy the output from HDFS onto the headnode

hadoop fs -copyToLocal /user/xxx/200707_corr_matrix

# Download the output from the headnode onto local machine

scp -r xxx@dsm1.doc.gold.ac.uk:~/weather/200707_corr_matrix /Users/jerid/Downloads/

In [None]:
Results:

In [40]:
file = open('weather_final_results/200707_corr_matrix/part-00000', 'r', encoding='UTF-8')

for line in file.readlines():
    print(line.strip())

WS and RH Correlation: -0.3441714843390527
WS and DBT Correlation: 0.5652201413259447
RH and DBT Correlation: -0.09552092151475639
