In this kernel, I will briefly analyze the stock prices data for further data analysis. My kernel contains 2 parts:

1. Descriptive analysis: A brief analysis of the data
2. Technical analysis: A simple approach to stock technical analysis

**1. Descriptive analysis:**

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 
import datetime
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
# Any results you write to the current directory are saved as output.
os.chdir('../input/spanish-stocks-historical-data')
print(os.listdir())

In [None]:
os.listdir()

In [None]:
filenames = [x for x in os.listdir()]
print(filenames)

First of all, what we gonna do is to read every dataset in the input folder at once for further analysis. In this case, I will read every datasets and concat them together into one DataFrame. We might want to only use the data from 2017 and so on, since they are recently updated and relevant to our future growth.

In [None]:
li = []
for filename in filenames:
    stock_name = filename.replace('.csv', '')
    df = pd.read_csv(filename, index_col=None, header=0)
    df['Name'] = stock_name
    df['Date'] = pd.to_datetime(df['Date'])
    df = df[df['Date'] >= '2016-01-01']
    df['Return'] = df['Close'].pct_change()
    df.dropna(inplace = True)
    li.append(df)

stock_df = pd.concat(li, axis=0, ignore_index=True)
stock_df.head()

In [None]:
stock_df[['Name','Close']].groupby(['Name']).count()

We will have to check whether or not they stock components are equally weighted. In this case, every stock has an equal count of 619 units.

In [None]:
stock_df.isnull().sum()

Before analyzing our dataset, we will need to calculate the number of missing values. In this case, we have none, so we are free to go!

In [None]:
stock_df.head()

In [None]:
def get_stock_data(stock_name):
    return stock_df[stock_df['Name'] == stock_name]

In [None]:
stock_df.Name.unique()

In [None]:
acs = get_stock_data('acs')
acs.head()

In [None]:
stock_pivot = stock_df.pivot('Date','Name','Close').reset_index()
stock_pivot.head()

We will hereby plot the heatmap of the correlation 

In [None]:
plt.figure(figsize = (15,10))
sns.heatmap(stock_pivot.corr())

Now we have the correlation heat map of our Spanish stock data. There are a lot of interesting correlations that can be explained here.
The correlation of 2 stocks range from -1 to 1. The higher the correlation, the more likely the stocks couple are gonna move along with each other. 
As we can see from the heated map, the darker the square, the less correlated the couple of stocks are. Some stock couples are so uncorrelated that they are almost negatively correlated (the correlation equal to -1). For example, in the case of the media groups (Mediaset, telefnica and atresmedia) versus energy (repsol, red-elctrica and naturgy-energy). they almost have negative correlation. Another example is also the medias and the construction industry (colonial, fcc and ferroval).
On the other hand, some groups have significant correlations, like banking and insurance (bankinter, caixabank and mapre) and medias.

In [None]:
stock_df.Name.unique()

In [None]:
stock_df.head()

In [None]:
energy_stocks = ['naturgy-energy','repsol', 'red-elctrica','iberdrola','acciona','siemens-gamesa','abengoa']
media_stocks = ['telefnica','mediaset', 'atresmedia','indra']
construction_stocks = ['colonial', 'fcc','ferrovial','acs','sacyr']
banking_insurance_stocks = ['bankinter', 'caixabank','mapre','banco-sabadell','mapfre','santander','bme','bbva']
production_stocks = ['acerinox','enags','inditex','grifols']

stock_df['Group'] = 0

In [None]:
stock_df.loc[stock_df['Name'].isin(energy_stocks), 'Group'] = 'Energy'
stock_df.loc[stock_df['Name'].isin(media_stocks), 'Group'] = 'Media'
stock_df.loc[stock_df['Name'].isin(construction_stocks), 'Group'] = 'Construction'
stock_df.loc[stock_df['Name'].isin(banking_insurance_stocks), 'Group'] = 'Banking'
stock_df.loc[stock_df['Name'].isin(production_stocks), 'Group'] = 'Production'

In [None]:
stock_df.Group.unique()

In [None]:
stock_df.head()

In [None]:
energy_df = stock_df[stock_df['Group'] == 'Energy'][['Date','Close','Return', 'Volume']].groupby('Date').mean()
production_df = stock_df[stock_df['Group'] == 'Production'][['Date','Close','Return','Volume']].groupby('Date').mean()
construction_df = stock_df[stock_df['Group'] == 'Construction'][['Date','Close','Return','Volume']].groupby('Date').mean()
media_df = stock_df[stock_df['Group'] == 'Media'][['Date','Close','Return','Volume']].groupby('Date').mean()
banking_df = stock_df[stock_df['Group'] == 'Banking'][['Date','Close','Return','Volume']].groupby('Date').mean()

plt.figure(figsize = (15,6))
top = plt.subplot2grid((4,4), (0, 0), rowspan=3, colspan=4)
bottom = plt.subplot2grid((4,4), (3,0), rowspan=3, colspan=4)

top.plot(energy_df.index,energy_df.Close)
top.plot(production_df.index,production_df.Close)
top.plot(construction_df.index,construction_df.Close)
top.plot(media_df.index,media_df.Close)
top.plot(banking_df.index,banking_df.Close)
top.legend(['Energy', 'Production', 'Construction', 'Media', 'Banking'])

bottom.plot(energy_df.index,energy_df.Volume)
bottom.plot(production_df.index,production_df.Volume)
bottom.plot(construction_df.index,construction_df.Volume)
bottom.plot(media_df.index,media_df.Volume)
bottom.plot(banking_df.index,banking_df.Volume)

As we can see from the chart above, the energy sector is experiencing a growth in stock price, gradually becoming the highest price, as well as the largest return stock. Production stock price is somewhat stable, while construction stock has a slight increase in return. Banking and Media seem to experiencing a declining trending, resulting in a negative return for investors. 

However, in the case of the banking sector, even though it has only 1 more component than the energy sector, there is a huge difference between the volume trading of the banking and energy sector. This is probably due to the fact that the banking sector has more stock outstanding in the market than the energy does, which result in its lower trading prices. Technically, as an investor, we should look at the return of the stock rather than the price at all. Still, the return of the banking sector is declining so if I had a chance to invest, I might prefer the energy and construction industry.   

In [None]:
plt.figure(figsize = (15,12))
plt.subplot(2,3,1)
ax1 = sns.distplot(energy_df['Return'])
ax1.set_title('Energy ')
plt.subplot(2,3,2)
ax2 = sns.distplot(production_df['Return'])
ax2.set_title('Production')
plt.subplot(2,3,3)
ax3 = sns.distplot(construction_df['Return'])
ax3.set_title('Construction')
plt.subplot(2,3,4)
ax4 = sns.distplot(media_df['Return'])
ax4.set_title('Media')
plt.subplot(2,3,5)
ax5 = sns.distplot(banking_df['Return'])
ax5.set_title('Banking')

In [None]:
testing = pd.concat([energy_df.Return,banking_df.Return, production_df.Return, media_df.Return, construction_df.Return],axis = 1)
testing.columns = ['Energy', 'Banking','Production','Media','Construction']

plt.figure(figsize = (15,10))
sns.heatmap(testing.corr())

Technically, as we can draw out from the correlation matrix above, the energy sector seems to have little positive correlation with other sectors. As I conducted research, most of the companies in the energy group adopt clean, renewable resources. From the line chart back then, we can conclude that the Energy sector experienced a much more significant growth in price than the other did. I truthly believe clean energy is our future. However, this innovating industry has negative correlation with Banking and Media stock prices. Probably the media is not trying its best to raise people awareness of the clean energy. The output of the energy sector will become the motivation for the production and construction industry, which can explain the relationship among them.

Production and construction have a significant positive correlation, which can be easily explained by that output of one will lead to the input of the other. 

Banking and Media have significant correlation with all other sectors except energy. Since banking facilitate the economics growth, it is understandable that the growth of this industry will lead to the booming of others. 

When choosing stocks for our portfolio, we should consider the correlation among our stocks, favourably the common positive correlated ones. In the next part, I will discuss further the technical analysis process of stock data.

**2. Technical analysis:**

Moving average (MA) is a widely used indicator in technical analysis that helps smooth out price action by filtering out the “noise” from random short-term price fluctuations. It is a trend-following, or lagging, indicator because it is based on past prices. There are two different categories of Moving Average, which are Simple Moving Average (SMA) and Exponential Moving Average (EMA)

Simple Moving Average (SMA), like its name, is much easier to compute. The formula for this indicator is:
![](https://miro.medium.com/max/1010/1*sTy16YVV5JSyuYk7gYLsbQ.png)

For stock technical analysis, we have the golden cross and the death cross, which represents the intersection between the Moving Average of 50 days (Short-term) and 200 days (Long-term). If the short term MA stays below the long term MA, the market is considered as bearish and the price is expected to fall. If the short term MA crosses the long term MA at the golden cross, the market is expected to break through.

Or to put it simply, if we see a death cross, consider shorting your stock. If we have a golden cross, then perhaps buying is a good idea. Let's dive into calculation!

I have written a function for calculating and plotting Moving Average line. We are going to use the rolling function to calculate the total value of a columns for a specific last recent number.

In [None]:
def plot_ma(stock_name):
    df = stock_df[stock_df['Name'] == stock_name]
    df['Short_MA'] = df['Close'].rolling(50).mean()
    df['Long_MA'] = df['Close'].rolling(200).mean()
    plt.figure(figsize = (15,6))
    plt.xticks(rotation=45)
    plt.plot(df.Date, df.Close)
    plt.plot(df.Date, df.Short_MA)
    plt.plot(df.Date, df.Long_MA)
    plt.title(stock_name)
    plt.legend()

In [None]:
plot_ma('bbva')

As you can see, from late 2016 to late 2017, the long term moving average had always been far under the Short MA line and the market seem to be bullish at the same time, with the increasing stock prices. We have a death cross at late 2017 - early 2018, which clearly resulted in a bearish market, with the stock price collapsing from 2018 to early 2019. 

However, the MA alone cannot tell exactly what is going on with the market. There are some periods that the MA shows an incorrect indication regarding the short term. Even though the period from 2018 to early 2019 showed how the stock prices gradually decremented in the long run, it could not predict short term fluctuation of the stock (sometimes the stocks increased in price just to fall). That's a limit of the MA.

If you feel this kernel is helpful, please send me an upvote for further motivation. Thank you!