Skip to content

pc1991/AMCStockData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMC Stock Data from 2014-Present

(Courtesy: Arpit Verma of Kaggle)

What is AMC Entertainment?

AMC Entertainment Holdings, Inc. (d/b/a AMC Theatres, originally an abbreviation for American Multi-Cinema; often referred to simply as AMC and known in some countries as AMC Cinemas or AMC Multi-Cinemas) is an American movie theater chain headquartered in Leawood, Kansas, and the largest movie theater chain in the world. Founded in 1920, AMC has the largest share of the U.S. theater market ahead of Regal and Cinemark Theatres. After acquiring Odeon Cinemas, UCI Cinemas, and Carmike Cinemas in 2016, it became the largest movie theater chain in the world. It has 2,866 screens in 358 theatres in Europe and 7,967 screens in 620 theatres in the United States.

Information about this dataset

This dataset provides historical data of AMC Entertainment Holdings, Inc. (AMC). The data is available at a daily level. Currency is USD.

Key:

Date - Actual Date

Open - Price from the first transaction of a trading day

High - Maximum price in a trading day

Low - Minimum price in a trading day

Close - Price from the last transaction of a trading day

Adj Close - Closing price adjusted to reflect the value after accounting for any corporate actions

Volume - Number of units traded in a day.

Setting Up The Validation Dataset

Now, in order to figure out which model works well for this dataset, a validation dataset is generated with 80% of the rows in the original dataset to use for training, with the remaining 20% of the data to be validated. The following graphs monitor the behavior of this training dataset.

Validation Boxplots:

AMC Stock Open Boxplot

AMC Stock High Boxplot

AMC Stock Low Boxplot

AMC Stock Close Boxplot

AMC Stock Adj Close Boxplot

AMC Stock Volume Boxplot

Validation Histograms:

AMC Stock Open Histogram

AMC Stock High Histogram

AMC Stock Low Histogram

AMC Stock Close Histogram

AMC Stock Adj Close Histogram

AMC Stock Volume Histogram

Validation Density Plots:

AMC Stock Open Density Plot

AMC Stock High Density Plot

AMC Stock Low Density Plot

AMC Stock Close Density Plot

AMC Stock Adj Close Density Plot

AMC Stock Volume Density Plot

Scatter Plot:

AMC Stock Scatter Plot

Correlation Plot:

AMC Stock Correlation Plot

Finding The Best Model To Run The Data

In order to figure out which model to use to best predict where the data will trend, we would have to develop a test-harness using 10-fold cross validation with three repeats using various algorithms using various units of measure. The dot-plots showing the performances of each algorithm will be shown below.

First Outcome Comparing AMC Stock Algorithms:

The algorithms in this first outcome are currently centered and scaled with the highly correlated values.

outcome <- resamples(list(LM = fit.lm, GLM = fit.glm, GLMNET = fit.glmnet, SVM = fit.svm, KNN = fit.knn))

summary(outcome)

Call:

summary.resamples(object = outcome)

Models: LM (Linear Model), GLM (Generalized Linear Model), GLMNET (Penalized Linear Model), SVM (Support Vector Machines), KNN (k-Nearest Neighbors)

Number of resamples: 30

MAE

       Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's

LM 12953051 15748600 18889597 18867085 20288098 28714143 0

GLM 12953051 15748600 18889597 18867085 20288098 28714143 0

GLMNET 12525068 14823381 16795523 16891746 18838886 21194430 0

SVM 5469373 9163424 11043345 11686925 14508471 18779865 0

KNN 3680526 7838716 9201742 9888181 12449735 18141919 0

RMSE

       Min.  1st Qu.   Median     Mean  3rd Qu.      Max. NA's

LM 18448670 28932707 39036872 45528957 54000982 90905224 0

GLM 18448670 28932707 39036872 45528957 54000982 90905224 0

GLMNET 15277092 26185167 37147190 41555763 52632754 90660491 0

SVM 7514733 26758981 40627907 44926372 60926524 101696784 0

KNN 11264514 27632479 39947197 45696253 59777525 106387510 0

Rsquared

         Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's

LM 0.30432783 0.4632728 0.5191817 0.5231661 0.5945303 0.7912763 0

GLM 0.30432783 0.4632728 0.5191817 0.5231661 0.5945303 0.7912763 0

GLMNET 0.31383694 0.4896796 0.5381849 0.5549713 0.6309693 0.8571197 0

SVM 0.32678897 0.4956748 0.5800586 0.5776315 0.7037011 0.8830470 0

KNN 0.03048308 0.3402185 0.5595156 0.4853565 0.6163114 0.8478114 0

First Dotplot Outcome Comparing AMC Stock Algorithms

Second Dotplot Outcome Comparing AMC Stock Algorithms:

The algorithms in this second outcome are currently centered and scaled without the highly correlated values.

outcome2 <- resamples(list(LM = fit.lm, GLM = fit.glm, GLMNET = fit.glmnet, SVM = fit.svm, KNN = fit.knn))

summary(outcome2)

Call:

summary.resamples(object = outcome2)

Models: LM, GLM, GLMNET, SVM, KNN

Number of resamples: 30

MAE

       Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's

LM 17120963 21566122 23612492 24000348 26919221 30935049 0

GLM 17120963 21566122 23612492 24000348 26919221 30935049 0

GLMNET 12525068 14823381 16795523 16891746 18838886 21194430 0

SVM 7473668 13559723 15741403 16038858 18803797 23202159 0

KNN 11324251 16083905 17852422 18445773 21068954 24700748 0

RMSE

       Min.  1st Qu.   Median     Mean  3rd Qu.      Max. NA's

LM 22152423 40720829 53675120 57491085 72138560 112722853 0

GLM 22152423 40720829 53675120 57491085 72138560 112722853 0

GLMNET 15277092 26185167 37147190 41555763 52632754 90660491 0

SVM 11644364 35641578 48940324 53708943 69864280 111750830 0

KNN 22902736 38815155 48411236 56359413 72153949 111490350 0

Rsquared

           Min.      1st Qu.      Median       Mean    3rd Qu.       Max. NA's

LM 1.483748e-05 0.0009268357 0.006890912 0.01918216 0.03102276 0.08289168 0

GLM 1.483748e-05 0.0009268357 0.006890912 0.01918216 0.03102276 0.08289168 0

GLMNET 3.138369e-01 0.4896796108 0.538184948 0.55497133 0.63096929 0.85711975 0

SVM 6.511007e-03 0.0458225992 0.142131962 0.20097528 0.30306220 0.71290539 0

KNN 7.462957e-03 0.0339076231 0.121907488 0.14276862 0.21591304 0.51108661 0

Second Dotplot Outcome Comparing AMC Stock Algorithms

Third Outcome Comparing AMC Stock Algorithms:

The algorithms in this third outcome are currently centered, scaled, and transformed using the Box-Cox method with the highly correlated values.

outcome3 <- resamples(list(LM = fit.lm, GLM = fit.glm, GLMNET = fit.glmnet, SVM = fit.svm, KNN = fit.knn))

summary(outcome3)

Call:

summary.resamples(object = outcome3)

Models: LM, GLM, GLMNET, SVM, KNN

Number of resamples: 30

MAE

       Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's

LM 14317954 19166740 20684487 20684035 22974526 26701879 0

GLM 14317954 19166740 20684487 20684035 22974526 26701879 0

GLMNET 15919979 18634403 20616042 20416441 22130482 24761106 0

SVM 5680386 8945198 10876737 11650119 14548465 18924878 0

KNN 4032008 7916614 9216504 9910743 12324078 17621998 0

RMSE

       Min.  1st Qu.   Median     Mean  3rd Qu.      Max. NA's

LM 20952996 29348149 36896169 42420653 49173955 94586011 0

GLM 20952996 29348149 36896169 42420653 49173955 94586011 0

GLMNET 20668753 29614356 36502576 41740995 48523019 87240671 0

SVM 7826026 26266721 39953805 44807230 60338621 102646673 0

KNN 13423914 28075877 39029866 45705324 59636949 104705672 0

Rsquared

         Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's

LM 0.20995793 0.4161312 0.5313362 0.5180253 0.6168191 0.8587488 0

GLM 0.20995793 0.4161312 0.5313362 0.5180253 0.6168191 0.8587488 0

GLMNET 0.21172570 0.4511238 0.5180361 0.5098385 0.5880617 0.7885906 0

SVM 0.29198507 0.4809362 0.5704306 0.5699200 0.7150422 0.8766482 0

KNN 0.02963859 0.3563041 0.5478481 0.4798609 0.6387612 0.8239043 0

Third Dotplot Outcome Comparing AMC Stock Algorithms

Ensemble Outcomes Comparing AMC Stock Algorithms

Three additional ensemble outcomes are added to improve the analysis of analyzing the data, all of which are transformed via the Box-Cox method.

ensemble <- resamples(list(RF = fit.rf, GBM = fit.gbm, CUBIST = fit.cubist))

summary(ensemble)

Call:

summary.resamples(object = ensemble)

Models: RF (Random Forest), GBM (Gradient Booster Method), CUBIST

Number of resamples: 30

MAE

       Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's

RF 5665450 7472449 8695215 9341970 11044333 14681649 0

GBM 10711818 13155020 14458774 14632866 16536161 18984534 0

CUBIST 3757394 5768194 7417536 7440709 8955991 12277018 0

RMSE

       Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's

RF 16168780 22975529 32974827 38079897 48155583 70467545 0

GBM 20919656 27326500 36363300 42106731 52561442 88884199 0

CUBIST 10527833 17260591 24987716 31072433 41017138 67678798 0

Rsquared

        Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's

RF 0.1021230 0.5789694 0.6794716 0.6267992 0.7462763 0.8697522 0

GBM 0.1110092 0.4715421 0.5421579 0.5268957 0.6505939 0.7527193 0

CUBIST 0.1621931 0.6962783 0.7533742 0.7322539 0.8724464 0.9536887 0

Ensemble Dotplot Outcomes Comparing AMC Stock Algorithms

Picking The Most Efficient Model

The aim for picking the most efficient model is to find the model with the lowest root mean square error (RMSE). There's also a mean absolute error (MAE) & an r-squared error. However, our main concern is the RMSE. Overall, given the data that is shown above, it is proven that the Cubist model would be the best model to run the data.

#Cubist Wins#

#Look deeper into winning model#

print(fit.cubist)

Cubist

1570 samples

5 predictors . Pre-processing: Box-Cox transformation (5)

Resampling: Cross-Validated (10 fold, repeated 3 times)

Summary of sample sizes: 1413, 1414, 1413, 1414, 1412, 1412, ...

Resampling results across tuning parameters:

committees neighbors RMSE Rsquared MAE

1 0 45728962 0.4990559 10291250

1 5 44921116 0.5135953 10035744

1 9 45045970 0.5123765 10115521

10 0 35877489 0.6351564 8310675

10 5 35542554 0.6366793 8500939

10 9 35559963 0.6389609 8574697

20 0 35253899 0.6521506 8158116

20 5 34942691 0.6501143 8382772

20 9 34929826 0.6531267 8448448

RMSE was used to select the optimal model using the smallest value.

The final values used for the model were committees = 20 and neighbors = 9.

#Train the Final Model#

library(Cubist)

set.seed(7)

x <- amc[,1:5]

y <- amc[,6]

preprocessParams <- preProcess(x)

tX <- sample(1:nrow(amc), floor(.8*nrow(amc)))

p <- c("Open", "High", "Low", "Close", "Adj Close")

tXp <- amc[tX, p]

tXr <- amc$Volume[tX]

fM <- cubist(x = tXp, y = tXr, commitees = 20, neighbors = 9)

fM

summary(fM)

predictions <- predict(fM, tXp)

#Compute the RMSE & R^2#

rmse <- sqrt(mean((predictions - tXr)^2))

r2 <- cor(predictions, tXr)^2

print(rmse) #17255498# - RMSE Value

print(r2) #0.903859# - R2 Value

Conclusion

In closing, this model above will most likely work for this type of regressive dataset. But, please also take note that those values fluctuate every day. Therefore, you still want to make sure to compare each and every algorithms to find the best possible model to generate a prediction of when the next price and volume will be.

Copyright

Copyright (c) 2021 Robert (Christian) Paul & Arpit Verma

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

AMC Stock Data from 2014-Present

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages