### **Link:** https://platform.stratascratch.com/data-projects/stocks-price-analysis

### **Difficulty:** Hard

# Stocks Price Analysis

<div><p><em>This data project has been used as a take-home assignment in the recruitment process for the data science positions at RedCarpetUp.</em></p>
<h2>Assignment</h2>
<p><strong>Part 1:</strong></p>
<ol>
<li>Create 4,16,....,52 week moving average(closing price) for each stock. This should happen through a function.</li>
<li>Create a rolling window of size 10 on each stock. Handle unequal time series due to stock market holidays. You should look to increase your rolling window size to 75 and see what the data looks like. Remember they will create stress on your laptop RAM load. ( Documentation you might need: <a href="http://in.mathworks.com/help/econ/rolling-window-estimation-of-state-space-models.html">http://in.mathworks.com/help/econ/rolling-window-estimation-of-state-space-models.html</a>)</li>
<li>Create the following dummy time series:
3.1 Volume shocks - If volume traded is 10% higher/lower than the previous day - make a 0/1 boolean time series for shock, 0/1 dummy-coded time series for the direction of shock.
3.2 Price shocks - If the closing price at T vs T+1 has a difference &gt; 2%, then 0/1 boolean time series for shock, 0/1 dummy-coded time series for the direction of shock.
3.3 Pricing black swan - If the closing price at T vs T+1 has a difference &gt; 2%, then 0/1 boolean time series for shock, 0/1 dummy-coded time series for the direction of shock.
3.4 Pricing shock without volume shock - based on points 3.1 &amp; 3.2 - Make a 0/1 dummy time series.</li>
</ol>
<p><strong>Part 2 (data visualization ):</strong>
For this section, you can use only <a href="https://bokeh.pydata.org/en/latest/docs/gallery.html">bokeh</a>.</p>
<ol>
<li>Create a time-series plot of close prices of stocks with the following features:</li>
<li>Color the time series in simple blue color.</li>
<li>Color time series between two volume shocks in a different color (Red)</li>
<li>Gradient color in blue spectrum based on the difference of 52-week moving average.</li>
<li>Mark closing Pricing shock without volume shock to identify volumeless price movement.</li>
<li>Hand craft partial autocorrelation plot for each stock on up to all lookbacks on bokeh - sample reference - <a href="https://www.statsmodels.org/dev/generated/statsmodels.graphics.tsaplots.plot_pacf.html">https://www.statsmodels.org/dev/generated/statsmodels.graphics.tsaplots.plot_pacf.html</a></li>
</ol>
<p><strong>Part 3 (data modeling)</strong>
For this section, you should use sklearn.</p>
<ol>
<li>Quickly build any two models. The quick build is defined as a grid search of less than 9 permutation combinations. You can choose the two options of multiple multivariate models from those mentioned below. The goal is to predict INFY, and TCS prices for tomorrow.  Models that you can choose:
<ul>
<li><a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoLars.html#sklearn.linear_model.LassoLars">LassoLars</a></li>
<li><a href="http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression">Linear Regression</a></li>
<li><a href="http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression">Ridge Regression</a></li>
<li><a href="http://scikit-learn.org/stable/modules/svm.html#regression">Support Vector Regression</a></li>
<li><a href="http://scikit-learn.org/stable/modules/ensemble.html#regression">Gradient Boosting Regression</a></li>
</ul>
</li>
<li>Write test cases for the two models you have chosen. Your testing should take at least 5-time steps except for today. your test cases must be written using pytest.</li>
<li>Prove your model does not violate any basic assumption. To understand "model assumptions", read
<a href="https://www.albert.io/blog/key-assumptions-of-ols-econometrics-review/">https://www.albert.io/blog/key-assumptions-of-ols-econometrics-review/</a></li>
<li>Select the best performing model, and tune it - Demonstrate that your tuning has resulted in a clear difference between quick build and tuning.</li>
<li><em>Extra credit</em> - Nest a model to predict volume shock into your time series model - same conditions applied as above.</li>
<li><em>Extra extra credit</em> - Create a bare python file in the following fashion <code>python stockpredictor.py ‘INFY’</code> should return prediction in less than 100 ms.</li>
</ol>
<h2>Data Description</h2>
<p>You are encouraged to use the <a href="https://github.com/swapniljariwala/nsepy">NSEPY module</a> for loading the data.</p>
<p>The original assignment was based on OCLHV data for NSE stocks with symbols INFY and TCS between 2015-2016 and on a Daily level. However, you can complete this project using any stock data you select. You can also choose other time periods.</p>
<p><strong>Example: loading stock data into a Pandas DataFrame using NSEPY:</strong></p>
<pre><code>!pip install nsepy
import nsepy

infy_df = nsepy.get_history(symbol='INFY',
                    start=date(2015,1,1), 
                    end=date(2015,12,31))
</code></pre>
<p>For convinience, we are providing you with the data on INFY and TCS stocks from 2015 in two CSV files: <code>INFY_2015.csv</code> and <code>TCS_2015.csv</code> respectively.</p>
<h2>Practicalities</h2>
<p>Please work on the questions in the displayed order. Define, train and evaluate predictive models that takes as the input the data provided. You may want to split the data into training, testing and validation sets, according to your discretion.</p>
<p>Make sure that the solution reflects your entire thought process - it is more important how the code is structured rather than the final metrics. You are expected to spend no more than 3 hours working on this project.</p></div>

## **Data:**

## **Solution:**