# DATA 601 Final Project
# Severe Thunderstorm Climatology vs Sunspot Number 2019-2020
## Python weather analysis application that compares the occurrence of severe thunderstorm activity occurring over the continental Unites States to daily international sunspot number measurements.

### Overview: December 2019 was confirmed as the starting point of the new solar activity cycle, in which sunspot activity was observed to be at an 11-year minimum. Through the winter and spring of 2020, sunspot number gradually increased with corresponding significant severe thunderstorm outbreaks occurring in the continental United States (CONUS) during February and April 2020.

<img src= "images/prediSC.png"> 

### Abstract:
The reference date and amplitudes of the minima and maxima of the 11-year solar cycle are established based on the sunspot number maintained and distributed by the SILSO World Datacenter since 1981 at the Royal Observatory of Belgium. Sunspots are dark areas that become apparent at the Sun's photosphere due to intense upward-forced magnetic flux from further within the solar interior, as illustrated in Figure 1. Heating areas along the magnetic flux in the upper photosphere and chromosphere become visible as faculae, causing cooler (7000 F), less dense, and darker areas at the core of these magnetic fields than in the surrounding photosphere (10,000 F). The resulting visible phenomenon is thus seen as sunspots. Active regions associated with sunspot groups are usually visible as bright enhancements in the corona at EUV and X-ray wavelengths.  Clette et al. (2014) provide a historical review and perspective of the development and evolution of sunspot number calculation techniques, most notably the International Sunspot Number.  Recent studies of the potential impact of sunspot activity on global weather and climate patterns have identified relationships between sunspot number and, collectively, south Asian Monsoon rainfall patterns (Rao 1976) and thunderstorm activity over central Europe (Schlegel et al. 2001).  Schlegel et al. (2001), based on a comparison of monthly-averaged sunspot number and lightning stroke frequency, inferred a significant functional relationship between sunspot number and thunderstorm activity over central Europe (R > 0.53, see Figure 2 below). However, this study only focused on warm-season (i.e., April to September) thunderstorm occurrence. Thunderstorm outbreaks during the cold season of the year (i.e., the months of November through March/April) typically occur in environments with low amounts of convective available potential energy (CAPE) and solar heating of the earth's surface. Cold-season severe thunderstorms are often generated by the release of potential instability by large-scale forcing. Thus, another enhancing factor for cold-season severe thunderstorms should be explored to describe storm outbreaks on a regional scale. The period from January to April 2020 exhibited several thunderstorm episodes that increased in magnitude with the new solar cycle's commencement.

<img src= "images/Fig1.png"> 

### Figure 1. Graphical description of sunspots and associated faculae.

<img src= "images/XML.png"> 

### Figure 2. Sample of XML file as retrieved from WxData API.

### Methodology/Data Acquisition:
Thunderstorm wind reports over CONUS are obtained via [WxData API](https://wxdata.com/api-storm-reports-explorer) in XML format, as shown in Figure 2. Corresponding daily sunspot number values are obtained from the [Sunspot Index and Long-term Solar Observations (SILSO) website](http://sidc.be/silso/datafiles). Since lightning is relatively rare during the cold season, measurements of severe wind (i.e., wind > 40 miles per hour (mph)) associated with convective storms is selected as a proxy variable of thunderstorm occurrence. The total number of wind reports is calculated for each semi-monthly period between November 2019 and April 2020 and compared to the sum of daily sunspot number values for each semi-monthly period. To build a more comprehensive dataset, both land-based and marine (i.e., coastal and nearshore) severe thunderstorm wind reports obtained for each semi-monthly period and concatenated into a single Pandas dataframe for each period as demonstrated in the notebook [thunderstorm_sunspot_datacall](https://github.com/kenpryor67/Severe_Thunderstorms_vs_Sunspot_Activity/blob/main/Notebooks/thunderstorm_sunspot_datacall.ipynb). This notebook contains a function defined as "dataframe_strmrpt" that builds a storm report listing for each semi-monthly period between November 2019 and April 2020, and the month of November 2020. This function 1) Builds a storm report dictionary from XML files, 2) Tests for the existence of storm reports for the period of interest, 3) Cleans the dataset and removes extraneous characters, such as the "MPH" label from each wind report field, and 4) returns a storm report dataframe. Each semi-monthly storm report dataframe is written and saved as a .csv file for further retrieval in the following notebooks. An example of the format of the storm report dataframe is shown below.

In [2]:
#Print head of 
import pandas as pd 
df_wxdata_apr2020l = pd.read_csv("data/wxdata_apr2020l.csv")
print(df_wxdata_apr2020l.head()) 

   Unnamed: 0                 Date      ID  Magnitude State
0           0  2020-04-18 11:06:00  883680         44    FL
1           1  2020-04-18 11:19:00  883679         44    FL
2           2  2020-04-18 13:06:00  883499         41    FL
3           3  2020-04-18 13:06:00  883585         41    FL
4           4  2020-04-18 13:21:00  883586         43    FL


The exploratory data analysis component of this study is detailed in the notebook [thunderstorm_sunspot_EDA](https://github.com/kenpryor67/Severe_Thunderstorms_vs_Sunspot_Activity/blob/main/Notebooks/thunderstorm_sunspot_EDA.ipynb). This notebook contains a series of modules that extract and print descriptive statistics for each semi-monthly storm report that includes: maximum, mean, and median wind magnitude, and number of storm reports.  An additional module reads the daily total sunspot number dataset obtained from SILSO and builds a Pandas dataframe (428 rows x 5 columns) from each column in the SILSO dataset as demonstrated below. 

In [3]:
sunspot_2019_2020 = pd.read_csv("data/sunspot_2019_2020.csv") 
print(sunspot_2019_2020)

     Unnamed: 0  Year  Month  Day  Sunspot No.
0             0  2019     10    1            6
1             1  2019     10    2            6
2             2  2019     10    3            0
3             3  2019     10    4            0
4             4  2019     10    5            0
5             5  2019     10    6            0
6             6  2019     10    7            0
7             7  2019     10    8            0
8             8  2019     10    9            0
9             9  2019     10   10            0
10           10  2019     10   11            0
11           11  2019     10   12            0
12           12  2019     10   13            0
13           13  2019     10   14            0
14           14  2019     10   15            0
15           15  2019     10   16            0
16           16  2019     10   17            0
17           17  2019     10   18            0
18           18  2019     10   19            0
19           19  2019     10   20            0
20           

The ultimate objective of the the notebook [thunderstorm_sunspot_EDA](https://github.com/kenpryor67/Severe_Thunderstorms_vs_Sunspot_Activity/blob/main/Notebooks/thunderstorm_sunspot_EDA.ipynb) is to build a dataframe that tabulates descriptive statistics for the entire study period, including November 2020, a particularly active month for convective storm activity over CONUS. The resulting dataframe, 13 rows x 5 columns, compares severe thunderstorm parameters to cumulative (total) sunspot number for each semi-monthly period as shown in the table below. 

In [4]:
solar_storm_df = pd.read_csv("data/solar_storm.csv") 
print(solar_storm_df)

    Unnamed: 0       Period  Total Sunspot No.  Storm Report No.  \
0            0   11/1-15/19                 14                56   
1            1  11/16-30/19                  0                34   
2            2   12/1-15/19                  0                10   
3            3  12/16-31/19                 47                33   
4            4    1/1-15/20                 91               120   
5            5   1/16-31/20                101                14   
6            6    2/1-14/20                  6               196   
7            7   2/15-29/20                  0                38   
8            8    3/1-15/20                 29                80   
9            9   3/16-31/20                 17                71   
10          10    4/1-15/20                 52               334   
11          11   4/16-30/20                104               586   
12          12   11/1-15/20                374               693   
13          13  11/16-30/20                646  

Linear regression is selected as the most appropriate model in which to derive a functional relationship. The notebook [Thunderstorm_climatology_model](https://github.com/kenpryor67/Severe_Thunderstorms_vs_Sunspot_Activity/blob/main/Notebooks/Thunderstorm_climatology_model.ipynb) details the comparison of linear and RANSAC regression models applied to the Sunspot-Storm Report dataframe displayed above. For the study period (November 2019 - April 2020), coefficient of determination, correlation coefficient, and explained variance are calculated for a dataset of 12 storm wind report and sunspot number totals to assess model performance. Thus, this study expands upon Schlegel et al. (2001) by analyzing sunspot and thunderstorm totals on a semi-monthly time scale during the cold season of the year.

### Research Questions:
1. What is the relationship between sunspot number and cold-season severe thunderstorm frequency?
2. How can this relationship be quantified, modeled, and assessed?

<img src= "images/table2_schlegel.png"> 
### Figure 2. Table 2 from Schlegel et al. (2001).

## Discussion/Summary:

Thunderstorm wind reports over CONUS were obtained via WxData API (https://wxdata.com/api-storm-reports-explorer), and corresponding daily sunspot number values were obtained from the Sunspot Index and Long-term Solar Observations (SILSO) website (http://sidc.be/silso/datafiles). The total number of wind reports was calculated for each semi-monthly period between November 2019 and April 2020 and compared to the sum of daily sunspot number values for each semi-monthly period. For the study period (November 2019 - April 2020), regression metrics were calculated for a dataset of 12 storm wind report and sunspot number totals. The correlation coefficient of 0.6 between the thunderstorm wind report and sunspot number total for each semi-monthly period is consistent with the results of Schlegel et al. (2001) that documented correlation coefficients of 0.53 and 0.65 between monthly lightning stroke (delta) number and sunspot number (R) over an entire solar cycle (1992-2000). Also, explained variance values of 0.39 to 0.41 show that sunspot activity could function as one of at least three crucial forcing factors for cold season thunderstorms, in addition to atmospheric instability and weather patterns.
In summary, this study entailed a short-term analysis of thunderstorm reports and sunspot number measurements, binned into semi-monthly segments over the first six months of solar cycle 25 and during the cold season. The correlation of 0.6 for both semi-monthly total and mean sunspot numbers demonstrates a significant functional relationship between sunspot and cold-season thunderstorm activity over CONUS that can serve as a starting point for more detailed studies of the relationship between solar cycles and the intensity of regional convective storm activity.

## Limitations/Future Work:
An outlier in this dataset is the low storm wind report total for January 16-31 2020, marked by a red "X" in the scatterplots. Although the sunspot number for this period was relatively high, the low number of wind events, recorded over offshore ocean waters, suggested that the majority of severe thunderstorm activity occurred over subtropical open ocean waters where weather observations are sparse relative to land areas. 

In order to derive a more robust functional relationship between sunspot number and thunderstorm activity over CONUS, this procedure will be conducted for cold seasons over an entire solar cycle (i.e., 11-year period).

## References:
Clette, F., Svalgaard, L., Vaquero, J.M. et al. Revisiting the Sunspot Number. Space Sci Rev 186, 35–103 (2014). https://doi.org/10.1007/s11214-014-0074-2

King, J. R., M. D. Parker, K. D. Sherburn, and G. M. Lackmann, 2017: Rapid Evolution of Cool Season, Low-CAPE Severe Thunderstorm Environments. Wea. Forecasting, 32, 763–779, https://doi.org/10.1175/WAF-D-16-0141.1.

Rao, Y. P., [Southwest Monsoon], India Meteorological Department, New Delhi, 2-5 (1976).

Schlegel, K., Diendorfer, G., Thern, S., & Schmidt, M. (2001). Thunderstorms, lightning and solar activity—Middle Europe. Journal of Atmospheric and Solar-Terrestrial Physics, 63(16), 1705-1713.

“The New Solar Activity Cycle.” Sunspot Index and Long-Term Solar Observations, Royal Observatory of Belgium, Brussels, 15 Sept. 2020, sidc.be/silso/node/167/#NewSolarActivity. 