Skip to content

This is poisson time series regression code (in R) for Advanced Statistics course.

Notifications You must be signed in to change notification settings

staciewow/poisson-time-series-regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

poisson-time-series-regression

This is poisson time series regression code (in R) for Advanced Statistics course.

The data includes hourly records of Beijing PM2.5 from Sept 20th, 2014 to Oct 19th, 2014. We downloaded this data from UCL Machine Learning Repository: (https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data) The data includes 720 records (30 days * 24 records/day) and 8 variables: year, month, day, hour, pm2.5, dew point, pressure and temperate. For the purpose of this research, we are only focusing on the variables of time (year, month, day, hour) and pm2.5(ug/m^3)).

PM2.5 refers to atmospheric particulate matter (PM) that has a diameter of less than 2.5 micrometers, which is about 3% the diameter of a human hair. Since they are so small and light, fine particles tend to stay longer in the air than heavier particles. This increases the chances of humans and animals inhaling them into the bodies. Owing to their minute size, particles smaller than 2.5 micrometers are able to bypass the nose and throat and penetrate deep into the lungs and some may even enter the circulatory system. Fine particles can come from various sources. They include power plants, motor vehicles, airplanes, residential wood burning, forest fires, agricultural burning, volcanic eruptions and dust storms.

Beijing has been suffering from the city pollution with high pm2.5 for a while now, therefore the Beijing government has set a pm2.5 goal for 2014 (http://www.reuters.com/article/us-argentina-economy-startups/soros-cohen-among-big-name-investors-betting-on-argentine-startups-idUSKBN1DG32R). The set goal of pm2.5 is 60 (ug/m^3): if the hourly pm2.5 is lower than 60 (ug/m^3), it means it achieved the goal for that hour; And, if the hourly pm2.5 was higher than 60 (ug/m^3), it means it didn't reach the goal for that hour. Before we fit the data into a poisson time series regression model, we need to perform some data pre-processing and data-cleaning. If for that hour, the pm2.5 reached the goal, we recoded it as 1; And, if for that hour, the pm2.5 didn't reach the goal, we recoded it as 0. We aggregated the data to "day" level, and calculated how many hours in a day had pm2.5 reached the goal. Then we fitted it into a regression model, and then used it to predict how many hours that pm2.5 reached the goal in the future 3 days from Oct 19th, 2014.

We first imported dataset as "data". Since we have 2 NA in the pm2.5 column, so we used the average of the previous pm2.5 and the next pm2.5 value to replace the NA. And we named the processed smooth dataset as smo_data. The smooth dataset covers 30 days of the record, which roughly is around 4 weeks. Therefore we set the frequency as 7*24 and we can see the the weekly trend as well.

About

This is poisson time series regression code (in R) for Advanced Statistics course.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages