-
Notifications
You must be signed in to change notification settings - Fork 0
sms2015/Python-Weather-Project
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
ABOUT THIS PROGRAM This program uses data from the NOAA Global Historical Climatology Network (GCHN) to analyze historical snowfall and snowpack at North American weather stations and looks for correlations between these measures and historical El Nino/La Nina episodes between the years 1950 and 2010. The Nino Index is a measurement of the variation from a 30 year average. This program calculates similar variations from the snowfall and snowdepth 30 year averages and looks for correlations between these values and the Nino Index. These variations are referred to as "delta" in this program. More information about the data used in this analysis can be found here: http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt More detailed information about the work done in this program can be found in the project report: 'IS 602 Final Project.pdf' also found in the project tar file. This program will provide prompts to the user to select the statistical significance level for the correlation analysis, and to then select a weather station to view a scatter plot of the data for that weather station. This program was created as final project for a Data Science Python programming class. As such, the emphasis was mostly on the code and less on the scientific process, so there are some potential scientific issues with this analysis. For example, in order to simplify the coding for this project this program ignores missing daily values when calculating monthly averages. HOW TO USE: Download the 'Python Weather Project.tar' and extract all files, this file contains all files needed EXCEPT the ghcd_all.tar.gz file which was too large to include in this file. The ghcnd_all.tar.gz file must be downloaded from the NOAA ftp site: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ The user will be presented with two options Select 0 to run all of the extraction and processing functions to create the processed data files or Select 1 to use the pre-processed file, which is the tar file: ghcnd_processed.tar If 0 is selected, at least 30GB of free space will be required for extracting and copying files, and the extraction and processing can take up to an hour to complete. You must have all of the following files in your directory to run option 0: ghcnd_all.tar, ghcnd-inventory.txt, ghcnd-stations.txt,Nino_index.csv If 1 is selected the ghcnd_processed.tar file will be used. This archive contains 37 files, which contain complete data records (all years, all months between October and April) for snowfall and snowdepth for the years 1950 through 2010, which also have significant correlations to the Nino Index when considering average monthly variations from the 30 year monthly average. You must have all of the following files in your directory to run option 1 ghcnd_processed.tar, ghcnd-inventory.txt, ghcnd-stations.txt, Nino_index.csv METHODOLOGY AND CODING Importing and Extracting the data: Data (“.dly” files) are extracted from the TAR file using the tarfile module. In subsequent steps using the metadata files "ghcnd-stations" and "ghcnd-inventory" are used to pare down the list of all station files to just those that are North American stations where there there is data for 1950 and 2010. To further reduce the files to just those that are relevant a station by station analysis is completed, to first look for stations with complete data records (see above) and finally correlations with the Nino Index as described briefly above. STATION ANALYSIS Note: Each “.dly” file has an identical structure Monthly Average: Computes a total snowfall and a snowpack average for for the “snow months” defined in this project as October – April for all years (1950 to the present). 30 Year Monthly Average: Computes an average monthly snowfall (an average of the monthly snow totals) and snowpack for the snow months (October – April) for the 30 year period 1981 – 2010. This is the most recently completed 30 year period and is commonly used to calculate climatological “normals” in meteorology. It should be noted that the ONI data is based on multiple centered thirty year periods; however, that is not the standard practice in meteorology and it would significantly complicate the analysis for this project to follow the ONI method. Therefore only the 1981-2010 thirty year period will be used to compute snowfall and snowpack averages. Monthly Snowfall vs 30 Year Average: Calculates the snowfall/snowpack deltas between the values in step 1 (monthly averages) and the values in step 2 (30 year averages). This step will create an array of monthly delta values (October-April) for the weather station, for each year in the ONI record (1950 – present). Correlation Analysis: Using scipy.stats.pearsonr to compute the correlation between each array (snowfall and snowpack) in step 3 and the ONI data set.(Each data point in the ONI table is a three month rolling average so a January-February-March 2010 value would be compared to a February 2010 value from the snowfall/snowpack arrays). Scatter Plot and Regression analysis: : A scatter plot with a linear regression line for snowfall and snowpack can be displayed for stations that show statistically significant correlations. For the purpose of this project, Pearson correlation p-value = 0.10 or less.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published