# <center> Predicting F-10 Days 1-7 Based on 2021 Solar Data </center>

*Background: Forecasters at the SWPC create a 7-day forecast for daily 10.7 cm Radion Flux (F-10 numbers) based on numerous qualitative and quantitative measures. F-10 numbers often used as a proxy for how active or inactive the Sun is. The higher the number, the more active. Higher F-10 numbers typically correlates to more complex sunspot regions and x-ray flares.* 

This project will determine if a machine learning algorithm can outperform forecasters in predicting F-10 numbers.

## Data Collection, Processing & Staging

### Data Sources:

#### Daily F-10 (observed)

Daily Flux was extracted spaceweather.gc.ca (Government of Canda) from a web table format into a Pandas DF

In [61]:
import pandas as pd

In [62]:
import requests

In [93]:
url = 'https://spaceweather.gc.ca/forecast-prevision/solar-solaire/solarflux/sx-5-flux-en.php?year=2021'
html = requests.get(url).content
f10_obs_list = pd.read_html(html)
f10_obs = f10_obs_list[-1]
print(f10_obs)
f10_obs.to_csv('f10_obs.csv')

            Date      Time   Julian day  Carringtonrotation  Observed Flux  \
0     2021-01-01  18:00:00  2459216.239            2239.252           79.8   
1     2021-01-01  20:00:00  2459216.322            2239.255           80.4   
2     2021-01-01  22:00:00  2459216.406            2239.258           79.9   
3     2021-01-02  18:00:00  2459217.239            2239.288           80.6   
4     2021-01-02  20:00:00  2459217.322            2239.291           81.5   
...          ...       ...          ...                 ...            ...   
1084  2021-12-30  20:00:00  2459579.322            2252.563          102.4   
1085  2021-12-30  22:00:00  2459579.406            2252.566          103.2   
1086  2021-12-31  18:00:00  2459580.239            2252.597          102.7   
1087  2021-12-31  20:00:00  2459580.322            2252.600          101.5   
1088  2021-12-31  22:00:00  2459580.406            2252.603          100.3   

      Adjusted Flux  URSI Flux  
0              77.2       69.4

In [94]:
f10_obs.columns

Index(['Date', 'Time', 'Julian day', 'Carringtonrotation', 'Observed Flux',
       'Adjusted Flux', 'URSI Flux'],
      dtype='object')

I have a lot of columns I don't need. I will clean up the table.

In [95]:
f10_obs = f10_obs.drop(['Julian day', 'Carringtonrotation', 'Adjusted Flux', 'URSI Flux'], axis=1)
f10_obs.head()

Unnamed: 0,Date,Time,Observed Flux
0,2021-01-01,18:00:00,79.8
1,2021-01-01,20:00:00,80.4
2,2021-01-01,22:00:00,79.9
3,2021-01-02,18:00:00,80.6
4,2021-01-02,20:00:00,81.5


Next I need to select all rows where time = 20.

In [96]:
f10_obs = f10_obs.loc[f10_obs['Time'] == "20:00:00"]
f10_obs.head()

Unnamed: 0,Date,Time,Observed Flux
1,2021-01-01,20:00:00,80.4
4,2021-01-02,20:00:00,81.5
7,2021-01-03,20:00:00,80.4
10,2021-01-04,20:00:00,77.6
13,2021-01-05,20:00:00,75.1


Next I will delete the Time column and reindex.

In [97]:
f10_obs = f10_obs.drop(['Time'], axis=1)
f10_obs.head()

Unnamed: 0,Date,Observed Flux
1,2021-01-01,80.4
4,2021-01-02,81.5
7,2021-01-03,80.4
10,2021-01-04,77.6
13,2021-01-05,75.1


In [101]:
f10_obs = f10_obs.reset_index(drop=True)
f10_obs.head()

Unnamed: 0,Date,Observed Flux
0,2021-01-01,80.4
1,2021-01-02,81.5
2,2021-01-03,80.4
3,2021-01-04,77.6
4,2021-01-05,75.1


Now I have a daily table with Observed F10 Numbers (also known as Flux).

In [103]:
f10_obs.shape

(364, 2)

#### Daily F-10 (forecast)

#### Solar Regions Summaries (SRS) from 2021

The SRS contains a daily sunspot record, including: Sunspot Area, Z (Zurich Classification Type), Mag Type, and Location. Other metrics can be calculated from these data, such as daily sunspot number.

A tar.gz file of all daily SRS summaries from 2021 was download from ftp.swpc.noaa.gov on 12/19/22 and saved locally.

In [13]:
import tarfile

In [16]:
file = tarfile.open('C:/Users/john_/OneDrive/Desktop/SWPC/F10 Project/2021_SRS.tar.gz')

In [17]:
file.extractall('C:/Users/john_/OneDrive/Desktop/SWPC/F10 Project')

In [18]:
file.close()

In [22]:
import os

In [34]:
folder = os.chdir('C:/Users/john_/OneDrive/Desktop/SWPC/F10 Project/2021_SRS')

In [31]:
os.getcwd() # local folder where all files are saved

'C:\\Users\\john_\\OneDrive\\Desktop\\SWPC\\F10 Project\\2021_SRS'

In [41]:
entries = os.listdir(folder) # list all files in folder

In [54]:
len(entries) # counts all files in folder, corresponding the 365 days in the year

365

The data can now begin to be extracted from the daily SRS's.

In [55]:
file1 = entries[0] # saving the first file to a variable 

In [56]:
test_file = open(file1, "r") # displaying the text file
print(test_file.read())
test_file.close()

:Product: 0101SRS.txt
:Issued: 2021 Jan 01 0030 UTC
# Prepared jointly by the U.S. Dept. of Commerce, NOAA,
# Space Weather Prediction Center and the U.S. Air Force.
#
Joint USAF/NOAA Solar Region Summary
SRS Number 1 Issued at 0030Z on 01 Jan 2021
Report compiled from data received at SWO on 31 Dec
I.  Regions with Sunspots.  Locations Valid at 31/2400Z 
Nmbr Location  Lo  Area  Z   LL   NN Mag Type
2794 S16W68   344  0180 Hsx  02   01 Alpha
2795 S18W38   316  0030 Bxo  10   04 Beta
IA. H-alpha Plages without Spots.  Locations Valid at 31/2400Z Dec
Nmbr  Location  Lo
None
II. Regions Due to Return 01 Jan to 03 Jan
Nmbr Lat    Lo
None



- X-ray flares from 2021 Edited Events