# **EDA Simplified: JPX Stock Exchange Prediction**

## Intro
Whenever you are everywhere, there were medium and huge businesses around the world (in which internationally), were investing in financial investments on each stock markets, especially the stock market in Beijing in which Ubiquant insisted. But this time, we are not in this Ubiquant competition, we are in JPX! Similarly to Ubiquant, we can make time series forecasting predictions against future stock market data, but however we use it in stock exchanges. But before we dive into this competition, let's do some **Exploratory**, **Data**, **Analysis**!

## Importing
Importing modules for our EDA is simple, we first import the pandas module as pd (data science use) and the numpy module as np (linear mathmatics use, specifically algebra). Next, we import the most important plotting modules, matplotlib with the pyplot submodule as plt, seaborn as sns, and finally and the most important, plotly with the express submodule as px!

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

## Dataframe Generation
Since JPX's Stock Exchange Prediction consisted most of csv files, we saw this as a great opportunity to read csvs using the `read_csv` function from `pd` over the 5 csv files in the train_files folder and `stock_list.csv`. Finally, we display our newly created 6 dataframes in all using the head function over each dataframe!

In [None]:
stock_list_df = pd.read_csv('../input/jpx-tokyo-stock-exchange-prediction/stock_list.csv')
financials_df = pd.read_csv('../input/jpx-tokyo-stock-exchange-prediction/train_files/financials.csv')
options_df = pd.read_csv('../input/jpx-tokyo-stock-exchange-prediction/train_files/options.csv')
sec_stock_prices_df = pd.read_csv('../input/jpx-tokyo-stock-exchange-prediction/train_files/secondary_stock_prices.csv')
stock_prices_df = pd.read_csv('../input/jpx-tokyo-stock-exchange-prediction/train_files/stock_prices.csv')
trades_df = pd.read_csv('../input/jpx-tokyo-stock-exchange-prediction/train_files/trades.csv')

In [None]:
stock_list_df.head()

In [None]:
financials_df.head()

In [None]:
options_df.head()

In [None]:
sec_stock_prices_df.head()

In [None]:
stock_prices_df.head()

In [None]:
trades_df.head()

## First Look In Data
Now after creating 6 dataframes, let's take a first look over the data of how many entities of observations in each of em!

In [None]:
obs_stock_list = stock_list_df.shape[0]
print(f"No of observations (stock_list): {obs_stock_list}")

In [None]:
obs_financials = financials_df.shape[0]
print(f"No of observations (financials): {obs_financials}")

In [None]:
obs_options = options_df.shape[0]
print(f"No of observations (options): {obs_options}")

In [None]:
obs_ssp = sec_stock_prices_df.shape[0]
print(f"No of observations (ssp): {obs_ssp}")

In [None]:
obs_stock_prices = stock_prices_df.shape[0]
print(f"No of observations (stock_prices): {obs_stock_prices}")

In [None]:
obs_trades = trades_df.shape[0]
print(f"No of observations (trades): {obs_trades}")

As always, let's head onto our fun part, E-D-A over J-P-X! Sounds good over that, and let's warp and graph em!

## EDA
While we were on the EDA Part, let's head off with 6 different sections, which now represent chapters on each dataframe we created!

### **Chapter 1: stock_list**
First, let's head over to the stock_list_df dataframe! We use matplotlib to find how many sector codes were there in stock_list_df dataframe! We define the variable ax to the stock_list_df dataframe with the '33SectorCode' section and plug the value_counts function by pandas first to count the number of sector codes in this dataframe then apply the plot function into it, containing 4 parameters: kind set to 'bar' (as we know that we are building a bar graph), title set to "Stock List Names" (to label our title to specify it), figsize set to 15 and 10 in parentheses (this represents width and height), and fontsize set to 12 (the size of the font). We set the x and y labels by plugging the set_xlabel and set_ylabel functions, containing "Name" (x-axis) and "Entities on Each Names" (y-axis), both set to 12 for the size of the font (using the fontsize parameter).

In [None]:
ax = stock_list_df['33SectorCode'].value_counts().plot(kind='bar', title='Sector Codes', figsize=(15, 10), fontsize=12)
ax.set_xlabel("Sector Codes", fontsize=12) 
ax.set_ylabel("Entities on Each Sector Code", fontsize=12)

As we can see, on this "Sector Codes" bar graph, we analyzed that the most number of sector codes is 5250, with about 565 observations and the least number of sector codes is 5150, with about 19 observations.

Now let's plot out the number of "33SectorName"  entities in our stock_list_df dataframe but now, with seaborn! It's simple! we use the sns module with the catplot function with three entities, which is: x set to the stock_list df dataframe with the "33SectorName" data plugged with the value_counts function from pandas (x-axis), kind set to "count" (a type of bar graph), data set to stock_list_df (insert dataframe). We also add two parameters aspect and height to 5 and 20, since there were a lot of sections of "33SectorNames" and thus saving our figure to 33sectornamesplot.png using the plt module with the savefig function.

In [None]:
sns.catplot(x="33SectorName", kind="count", data=stock_list_df, aspect=5, height=20)
plt.savefig("33sectornames.png")

Yet, the labels for the x-axis and the y-axis for this figure above seemed to be so small that we saved our figure so that you can download it and take a close, close look by zooming in with your photo viewer. We suggest of what we said and we'll be appreciated when you did that.

Now after that, let's use catplot again! It is similar to what we did to find the number of data in 33SectorName previously however we set our graph up with no kind parameter. All we have to do is just setting up at least three parameters: x set to "EffectiveDate", y set to "33SectorName" and aspect set to 3.

In [None]:
sns.catplot(x="EffectiveDate", y="33SectorName", data=stock_list_df, aspect=3)

Finally for our analysis over our stock_list_df dataframe, let's use the concept of data distribution in Plotly! We first define the variable, dist to the stock_list_df dataframe that counts the "SecuritiesCode" data entities in it and set the size of it by using the size function thus resetting our index of our stock_list_df dataframe, consisting of setting the name parameter to 'total'. Now, we combine our two dataframes by defining a new dataframe variable stock_list_df_two, to using the merge function tagged by the pd module, containing the dist dataframe and our stock_list_df dataframe, and the two parameters: how set to left (type of merge to be performed) and on set to the 'SecuritiesCode' data key surrounded by square brackets. We now define our stock_list_df_two dataframe again but to similar of what we did to the dist variable, but with the "17SectorName" data index. 

Now let's get to the fun part, plotting our graphs with Plotly! We first define our variable, fig, to the px module that has the bar function (the purpose of this is to create a bar graph figure), containing 4 parameters: x (x-axis input) set to the stock_list_df_two dataframe with the "17SectorName" index, y (y-axis input) set to the stock_list_df_two again but with the "total" index, color (color legend) set to the same setup as the x parameter, and color_continuous_scale set to "Emrld". We then title our x-axes and y-axes by plugging the update_yaxes and the update_xaxes functions into the fig variable figure, setting the title parameter inside to "Names" (update_xaxes) and "Num of Rows" (update_yaxes). Finally, let's update our layout of our figure by plugging the update_layout function to the fig variable figure, setting two parameters: showlegend set to True, and title set to a dictionary containing 5 objects (which we explain it in a code cell below), and template set to "plotly_white". Thus, we display our figure using the show function to the fig variable figure.

In [None]:
dist=stock_list_df.groupby('SecuritiesCode').size().reset_index(name='total')
stock_list_df_two=pd.merge(stock_list_df,dist, how='left',on=['SecuritiesCode'])
stock_list_df_two=stock_list_df_two.groupby(['17SectorName']).total.sum().reset_index(name='total')
fig = px.bar(x = stock_list_df_two["17SectorName"],
             y = stock_list_df_two['total'], 
             color = stock_list_df_two["17SectorName"],
             color_continuous_scale="Emrld") 
fig.update_xaxes(title="Assets")
fig.update_yaxes(title = "Number of Rows")
fig.update_layout(showlegend = True,
    title = {
        'text': 'Data Distribution ', # Text of title
        'y':0.95, # position of title in y-axis in numbers
        'x':0.5, # position of title in x-axis in numbers
        'xanchor': 'center', # direction of position over title (x)
        'yanchor': 'top'} , # direction of position over title (y)
        template="plotly_white")
fig.show()

When we distributed our data from stock_list_data_df dataframe to stock_list_data_df_two dataframe, we see that the most data entity count is the second 'IT % SERVICES, OTHERS' data index in the graph and the least data entity count is the first 'FOODS' data index, though it is too small to examine that we suggest that you may need to zoom in, since Plotly has the zoom in/zoom out option.

And that's it! We now completed our data analysis of our stock_list_df dataframe! Now, let's warp into analyzing our financials_df dataframe!

### **Chapter 2: financials**
Now after we analyzed the stock_list dataframe, let's now proceed to analyzing the financials dataframe! We first find the number of "TypeOfDocument" data entities by using Plotly. We vaguely follow of how we analyzed the "17SectorName" entities in Chapter 1, but this time, we use the "TypeOfDocument" data entities in the financials_df dataframe.

In [None]:
dist_two=financials_df.groupby('SecuritiesCode').size().reset_index(name='total')
financials_df_two=pd.merge(financials_df,dist_two, how='left',on=['SecuritiesCode'])
financials_df_two=financials_df_two.groupby(['TypeOfDocument']).total.sum().reset_index(name='total')
fig = px.bar(x = financials_df_two["TypeOfDocument"],
             y = financials_df_two['total'], 
             color = financials_df_two["TypeOfDocument"],
             color_continuous_scale="Emrld") 
fig.update_xaxes(title="Assets")
fig.update_yaxes(title = "Number of Rows")
fig.update_layout(showlegend = True,
    title = {
        'text': 'Data Distribution ', # Text of title
        'y':0.95, # position of title in y-axis in numbers
        'x':0.5, # position of title in x-axis in numbers
        'xanchor': 'center', # direction of position over title (x)
        'yanchor': 'top'} , # direction of position over title (y)
        template="plotly_white")
fig.show()

As we analyzed this, we see that the entity that has the most data is the first ForecastRevision, and the least data entity is 2QFinancialStatements_Consolidated_US.

Next, we plot out the categorical data of TypeOfCurrentPeriod And SecuritiesCode from financials_df dataframe by using Seaborn's catplot! All we need to do is to call out the catplot function that was connected by the sns module, setting up four parameters, x set to "TypeOfCurrentPeriod", y set to "SecuritiesCode", kind set to swarm, and data set to financials_df dataframe with the index of the same dataframe again but with the SecuritiesCode index ranging from 100 to 5000 using the between function, setting the inclusive parameter to False. 

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
sns.catplot(x="TypeOfCurrentPeriod", y="SecuritiesCode", kind="swarm", data=financials_df[financials_df.SecuritiesCode.between(100, 5000, inclusive=False)])

As you can see, a lot of data plots were clamped together in the x-axes of 3Q, 2Q, FY, and 1Q between the range from about 1000ish to 5000 in the y-axis. Thus, we suppress all warnings when running this code cell above, so you may need to show the hidden code cell after the Seaborn code cell.

We also visualize the categorical data of TypeOfCurrentPeriod and SecuritiesCode again but with a boxplot. We vaguely follow of what we analyzed it with the data swarm, but we set the kind parameter with box.

In [None]:
sns.catplot(x="TypeOfCurrentPeriod", y="SecuritiesCode", kind="box", data=financials_df[financials_df.SecuritiesCode.between(100, 5000, inclusive=False)])

As we run through the boxplot observations, we see that the 4Q data entity has at least one outlier, and 5Q is just a thick bold line, since it has a single data entity.



Finally, we plot out the line graph with Matplotlib! We vaguely follow of what we did for plotting out the data entities of the 33SectorCode from stock_list_df dataframe, but we changed the kind parameter in the plot function to line and the title parameter to Current Periods. Thus, the titles for the functions set_xlabel and set_ylabel is renamed to "Current Periods" (x) and "# on Each Current Period" (y).

In [None]:
ax = financials_df['TypeOfCurrentPeriod'].value_counts().plot(kind='line', title='Current Periods', figsize=(15, 10), fontsize=12)
ax.set_xlabel("Current Periods", fontsize=12) 
ax.set_ylabel("# on Each Current Period", fontsize=12)

So, we've did it! We've gotten through somewhat most of our data analysis over the financials_df dataframe! Now, let's fly through our next chapter to our analyzation of the options_df dataframe!

### **Chapter 3: options**
Since we've done analyzing the dataframes of stock_list_df and financials_df, let's advance to analyzing the options_df dataframe! First, let's analyze the number of number entities in WholeDayOpen by using Matplotlib's histogram feature! We follow of what we did previously for analyzing the current periods from the financials_df dataframe in the previous chapter, but we just rename the titles for analyzing the "WholeDayOpen" data entities in options_df thus changing the kind parameter to hist.

In [None]:
ax = options_df['WholeDayOpen'].value_counts().plot(kind='hist', title='Days Open', figsize=(15, 10), fontsize=12)
ax.set_xlabel("Time", fontsize=12) 
ax.set_ylabel("# of Stock", fontsize=12)

When we run this code cell above, we clearly see that most data entities happen between 0.0 and 0.5, while the least data entities happen on 3.0. Now let's create a candlestick plot with bokeh since we are getting serious over plotting stocks.

First, we import four additional modules, pi from the math module, figure from the bokeh module with the plotting submodule, output_notebook, show from the bokeh module with the io submodule, and INLINE from the bokeh module again but with the resources submodule. We then call the output_notebook function, setting the resources parameter to INLINE because it is important to install BokehJS before plotting otherwise the plots from bokeh cannot function, thus, bokeh's candlestick plots required the dataframes to have a date index, we create a new data index in the options dataframe by defining the dataframe again, but with the index attribute indicating that we are creating another data index key to the DatetimeIndex function from the pd module, containing the 'Date' data index from options_df dataframe. We now create the variables inc and dec and define them to the options_df dataframe with the WholeDayClose (WholeDayOpen for dec) data index greater than the options_df dataframe with the WholeDayOpen (WholeDayClose for dec) data index. We then define the w variable to the equation of 12 * 60 * 60 * 100. Now, we define the variable p to a new figure, which is the figure function call containing four parameters inside, which is: x_axis_type (type for x-axis) to "datetime", plot_width (width of a plot figure) to 800, plot_height (height of a plot figure) to 500, and title (title of a plot figure) set to "Whole Day Stocks". We plug the segment function to the p variable figure setup, containing the four data indexes of the options_df dataframe, which is: index, WholeDayHigh, index again, and WholeDayLow and thus, we set the color parameter to black. Finally, we plug the two vbar functions into the p variable figure, containing the options_df dataframe with the index data attribute containing the inc variable slice index (dec variable slice index for the 2nd one), the w variable value, the options_df dataframe with the WholeDayOpen data attribute containing a slice index of the inc variable (dec variable for the 2nd one), the options_df dataframe again, but with the WholeDayClose data attribute containing the slice index of the inc variable (dec variable for the 2nd one), with two parameters like fill_color (color fill) to "lawngreen" ("tomato" for the 2nd one), and line_color (the color of the line) to "red" ("lime" for the 2nd one). Afterwards, we call the show function to show the graph figure, containing the p variable figure.

In [None]:
from math import pi
from bokeh.plotting import figure
from bokeh.io import output_notebook,show
from bokeh.resources import INLINE

output_notebook(resources=INLINE)
options_df.index = pd.DatetimeIndex(options_df['Date'])

inc = options_df.WholeDayClose > options_df.WholeDayOpen
dec = options_df.WholeDayOpen > options_df.WholeDayClose

w = 12*60*60*1000

p = figure(x_axis_type="datetime", plot_width=800, plot_height=500, title = "Whole Day Stocks")
p.segment(options_df.index, options_df.WholeDayHigh, options_df.index, options_df.WholeDayLow, color="black")
p.vbar(options_df.index[inc], w, options_df.WholeDayOpen[inc], options_df.WholeDayClose[inc], fill_color="lawngreen", line_color="red")
p.vbar(options_df.index[dec], w, options_df.WholeDayOpen[dec], options_df.WholeDayClose[dec], fill_color="tomato", line_color="lime")

show(p)

As always, we can see that the candlestick plots were way too small to see so that you can zoom in a few times by using bokeh's built-in zoom in option. But we may be uncertained whether this plot appeared to be shown. 

And yet, this is the end of Chapter 3 of analyzing our options_df dataframe, and yet for sure, let's proceed to the analysis of Chapter 4 of secondary_stock_prices_df dataframe!

### **Chapter 4: secondary_stock_prices**
Next, let's swoop in to analyzing the secondary_stock_prices_df! This time, let's just use Plotly, since we are experiencing some problems with Bokeh. First, we create a function, get_data, containing stock_id and the data parameter set to sec_stock_prices_df dataframe. We then define the get_data_df dataframe variable to the sec_stock_prices_df dataframe, containing the data index of this dataframe again but with the SecuritiesCode data index that is equal to the stock_id variable function with the reset_index function on the outside in order to reset our data index of this dataframe, setting the drop parameter to True. Next, we set up our date index in the secondary_stock_prices_df dataframe by defining the get_data_df dataframe with the Date index to the pd module with the to_datetime function, containing the get_data_df dataframe with the Date index. We now define the get_data_df dataframe again but to itself with the set_index function with the Date string so that we set our index to the secondary_stock_prices_df dataframe. finally, we return the get_data_df dataframe. Outside of the get_data function, we are analyzing the stocks of the "SAMSUNG KODEX200 SECURITIES EXCHANGE TRADED FUND [STOCK]" and "Nikko Exchange Traded Index Fund 225" by defining two variables: samsung and nikko to the get_data function call, in which stock_id parameter is set to 1313 (1330 for the nikko variable).

In [None]:
def get_data(stock_id, data=sec_stock_prices_df):
    get_data_df = sec_stock_prices_df[sec_stock_prices_df["SecuritiesCode"] == stock_id].reset_index(drop=True)
    get_data_df["Date"] = pd.to_datetime(get_data_df["Date"])
    get_data_df = get_data_df.set_index("Date")
    return get_data_df

samsung = get_data(stock_id=1313)
nikko = get_data(stock_id=1330)

Now, let's create another function called sec_candlestick_chart, containing our sec_stock_prices dataframe and the title variable function. Inside this function, we import the plotly module again, but with the graph_objects submodule as go. We then define a candlestick variable to the go module with the Figure function, containing the data parameter, which was set to the square brackets setup, containing the candlestick figure setup by the go module with the Candlestick function, containing 5 parameters: x set to the data variable function with the index data attribute, open set to the data variable function with the slice index of Open in strings surrounded by parentheses, high, low, and close set to the same of what we set up for open parameter, but the strings inside the parentheses in the data function variable slice index is set to "High", "Low", and "Close". After that, we update our x-axes in our candlestick figure, by using the update_xaxes function to the candlestick variable figure, setting the title_text parameter to 'Time' to label the x-axis and rangeslider_visible parameter to True, since we want to see a close lookup of our candlestick stock analysis. We then proceed to update our layout of our candlestick plot by applying the candlestick variable to the update_layout function, containing two parameters, title to a dictionary with 5 entities (explained in the code cell below), and template set to any built-in choice you'd like (Mine's Seaborn). We now install a y-axis label by using the update_yaxes function to the candlestick variable, setting up the title_text parameter to "Price in USD" and the ticksuffix parameter to a dollar sign in strings. Finally, we return the candlestick plot variable.

In [None]:
def sec_candlestick_chart(data, title):
    import plotly.graph_objects as go
    
    candlestick = go.Figure(data=[go.Candlestick(x=data.index, open=data[("Open")], high=data[("High")], low=data[("Low")], close=data[("Close")])])
    candlestick.update_xaxes(title_text='Time', rangeslider_visible=True)
    
    candlestick.update_layout(
        title={
            'text': '{:} Candlestick Chart (Sec)'.format(title), # text to title our candlestick chart
            'y': 0.9, # y-position
            'x': 0.5, # x-position
            'xanchor': 'center', # position in x-axis
            'yanchor': 'top'}, # position in y-axis
        template="seaborn")
    
    candlestick.update_yaxes(title_text='Price in USD', ticksuffix='$')
    return candlestick

After we created our function, sec_candlestick_chart, let's try plugging in  Samsung's stock chart! We first define a variable, samsungplot to the sec_candlestick_chart call, containing the samsung variable, and setting our title parameter variable to "Samsung". Finally, we show Samsung's Candlestick Chart by plugging the show function to the samsungplot variable!

In [None]:
samsungplot = sec_candlestick_chart(samsung, title="Samsung")
samsungplot.show()

Now let's analyze the Nikko stock data in a candlestick! It's just the same as what we analyzed Samsung's stocks, but the sec_candlestick_chart was set to the nikko variable, thus the title parameter variable was set to "Nikko Exchange".

In [None]:
nikkoplot = sec_candlestick_chart(nikko, title="Nikko Exchange")
nikkoplot.show()

After the two plots ran above, we see that Samsung and Nikko Exchange Candlestick Charts showed historic stock market data from 2017 to about 2022ish, especially the [2020 Stock Market Crash in JPX](https://www.cnn.com/2020/09/30/investing/global-stocks/index.html).

Now, let's analyze and understand stock volatility of Samsung's (and Nikko Exchange's) stock market data! 

First, we define the variables samsung and nikko again, but with the return data index to the samsung variable (nikko variable) with the Close data index divided by itself again but with the shift function containing 1 that was enclosed with parentheses minus 1.

In [None]:
samsung["Return"] = (samsung["Close"]/samsung["Close"].shift(1)) - 1

In [None]:
nikko["Return"] = (nikko["Close"]/nikko["Close"].shift(1)) - 1

Next, we create a histogram about stock volatility over Samsung and Nikko Exchange! Before we create a histogram, plotly required iplot or more of those, so we import download_plotlyjs, init_notebook_mode, plot, and iplot from the plotly module again with the the offline submodule. We create a figure by defining a variable, fig to the px module that has the histogram function, containing the samsung variable and the x parameter which was set to "Return". We then use update_layout to the fig variable (to update our layout of our figure), containing two parameters: plot_bgcolor (plot of background color) set to "white", and title_text (title) to "Stock Volatility of Samsung". We proceed to update our x and y axes by using two functions, update_yaxes and update_xaxes to the fig variable, in which their parameters are: showticklabels (to show tick labels) set to True, showline (to show lines of our plot) set to True, linewidth (the width of a line) to 2, linecolor (color of a line) set to "blue", and title_text (title of a label text) set to Return (Value for update_xaxes function), encased with the `<b></b>` tags, which were the HTML tag to make the text bold. Finally, we call the iplot function, containing the fig variable.

In [None]:
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot 
fig = px.histogram(samsung, x = "Return")

fig.update_layout(
    plot_bgcolor = "white",
    title_text="Stock Volatility of Samsung"
)

fig.update_yaxes(showticklabels = True, showline = True, linewidth = 2, linecolor = "black",
                title_text="<b>Value</b>")
fig.update_xaxes(showticklabels = True, showline = True, linewidth = 2, linecolor = "black", 
                 title_text="<b>Return</b>")
iplot(fig)

Now let's apply it to the nikko variable, in which we followed of what we did to understand the stock volatility of Samsung but the variable nikko was put in the histogram function with the px module, and the title_text parameter in the update_layout function that was plugged to the fig variable was set to "Stock Volatility of Nikko Exchange".

In [None]:
fig = px.histogram(nikko, x = "Return")

fig.update_layout(
    plot_bgcolor = "white",
    title_text="Stock Volatility of Nikko Exchange"
)

fig.update_yaxes(showticklabels = True, showline = True, linewidth = 2, linecolor = "black",
                title_text="<b>Value</b>")
fig.update_xaxes(showticklabels = True, showline = True, linewidth = 2, linecolor = "black", 
                 title_text="<b>Return</b>")
iplot(fig)

As you can see, we observed that the range of the most counts in the stock volatilities of Nikko Exchange and Samsung is between -0.04 and 0.04 for the Nikko Exchange (-0.05 and 0.05 for Samsung).

And with that, We analyzed the stock market data and volatility from Samsung and Nikko Exchange! With all to say, let's jump into the next chapter, which is analyzing our stock_prices_df dataframe!

### **Chapter 5: stock_prices**
After our chapter 5 analysis of the sec_stock_prices_df dataframe, let's move to analyzing our stock_prices_df dataframe! For creating our candlestick chart, we had enough of plotting this Bokeh previously, let's plot our candlestick figures in Matplotlib and Plotly!

First, we define a function, called get_data_two, witch it is similar to the get_data function previously, since we need to find our stock data of Kyokuyo and Maruha Fisheries. Speaking of them, we define two variables, kyokuyo and maruha to the get_data_two function call, setting the stock_id parameter to 1301 (1333 for maruha).

In [None]:
def get_data_two(stock_id, data=stock_prices_df):
    get_data_df = stock_prices_df[stock_prices_df["SecuritiesCode"] == stock_id].reset_index(drop=True)
    get_data_df["Date"] = pd.to_datetime(get_data_df["Date"])
    get_data_df = get_data_df.set_index("Date")
    return get_data_df

In [None]:
kyokuyo = get_data_two(stock_id=1301)
maruha = get_data_two(stock_id=1333)

After that, let's define another function called candlestick_chart, which is similar to the one we did for the sec_candlestick_chart function, just for the Plotly plotting over Kyokuyo and Maruha Corporation stock prices.

In [None]:
def candlestick_chart(data, title):
    import plotly.graph_objects as go
    
    candlestick = go.Figure(data=[go.Candlestick(x=data.index, open=data[("Open")], high=data[("High")], low=data[("Low")], close=data[("Close")])])
    candlestick.update_xaxes(title_text='Time', rangeslider_visible=True)
    
    candlestick.update_layout(
        title={
            'text': '{:} Candlestick Chart (Sec)'.format(title), # text to title our candlestick chart
            'y': 0.9, # y-position
            'x': 0.5, # x-position
            'xanchor': 'center', # position in x-axis
            'yanchor': 'top'}, # position in y-axis
        template="seaborn")
    
    candlestick.update_yaxes(title_text='Price in USD', ticksuffix='$')
    return candlestick

Next, let's taste Matplotlib's candlestick chart feature! First, we create a new figure with the plt module with the figure function. Then, we set two variables, width and width2 to .4 and .05, so that we need to define them the width of candlestick elements. We also define two variables for up and down prices, which were up and down to the kyokuyo dataframe with the slice index of itself with the Close data attribute that is greather than or equal to (less than for the down variable) to kyokuyo dataframe with the Open data attribute. Next, we are going to define color variables, col1 and col2 to red (green for the col2 variable) in strings. 

Now let's get to the most exciting part, plotting the prices. For plotting the up prices, we use the plt module with the bar function three times, containing the up variable with the index data attribute, the up variable with the Close data attribute (High data attribute for the 2nd one and Low data attribute for the third one) minus the up variable with the Open data attribute (Close data attribute for the 2nd one and Open data attribute again for the 3rd one), the variable width (width2 for 2nd and 3rd one), the bottom attribute set to the up variable again but with the Open data attribute (Close data attribute for the 2nd one, Open data attribute for the third one), and color set to the col1 variable. 

We then plot the down prices, setting the plt module with the bar function, containing the down variable with the index data attribute, down variable with the Close data attribute (High data attribute for the 2nd one and Low data attribute for the 3rd one) minus the down variable with the Open data attribute (2nd one same, Close data attribute for the 3rd one), the width variable (width2 variable for the 2nd and 3rd one), the bottom attribute set to the down variable with the Open data attribute (same to 2nd one, Close data attribute to the 3rd one), and the color attribute set to the col2 variable.

We proceed to rotate the x-axis thus ticking labels by using the plt module with the xticks function, setting the rotation parameter to 45 and ha to right. Finally, we display our candlestick chart by using the plt module again with the show function.

In [None]:
#create figure
plt.figure()

#define width of candlestick elements
width = .4
width2 = .05

#define up and down prices
up = kyokuyo[kyokuyo.Close>=kyokuyo.Open]
down = kyokuyo[kyokuyo.Close<kyokuyo.Open]

#define colors to use
col1 = 'green'
col2 = 'red'

#plot up prices
plt.bar(up.index,up.Close-up.Open,width,bottom=up.Open,color=col1)
plt.bar(up.index,up.High-up.Close,width2,bottom=up.Close,color=col1)
plt.bar(up.index,up.Low-up.Open,width2,bottom=up.Open,color=col1)

#plot down prices
plt.bar(down.index,down.Close-down.Open,width,bottom=down.Open,color=col2)
plt.bar(down.index,down.High-down.Open,width2,bottom=down.Open,color=col2)
plt.bar(down.index,down.Low-down.Close,width2,bottom=down.Close,color=col2)

#rotate x-axis tick labels
plt.xticks(rotation=45, ha='right')

#display candlestick chart
plt.show()

As you can see, the Kyokuyo stock market data plots for Matplotlib was kinda similar to the one with Plotly which is similar last chapter down below. However, for Matplotlib, the candlestick plots for high and low were too faint and somewhat seperated from others, so we included the Plotly version to compare the Matplotlib one to the Plotly one.

In [None]:
kyokuyoplot = candlestick_chart(kyokuyo, title="Kyokuyo")
kyokuyoplot.show()

Now let's plot out the stock data of Maruha Corporation by using Matplotlib and Plotly again!

In [None]:
#create figure
plt.figure()

#define width of candlestick elements
width = .4
width2 = .05

#define up and down prices
up = maruha[maruha.Close>=maruha.Open]
down = maruha[maruha.Close<maruha.Open]

#define colors to use
col1 = 'green'
col2 = 'red'

#plot up prices
plt.bar(up.index,up.Close-up.Open,width,bottom=up.Open,color=col1)
plt.bar(up.index,up.High-up.Close,width2,bottom=up.Close,color=col1)
plt.bar(up.index,up.Low-up.Open,width2,bottom=up.Open,color=col1)

#plot down prices
plt.bar(down.index,down.Close-down.Open,width,bottom=down.Open,color=col2)
plt.bar(down.index,down.High-down.Open,width2,bottom=down.Open,color=col2)
plt.bar(down.index,down.Low-down.Close,width2,bottom=down.Close,color=col2)

#rotate x-axis tick labels
plt.xticks(rotation=45, ha='right')

#display candlestick chart
plt.show()

Likewise to what we did to Kyokuyo, feel free to compare the Matplotlib candlestick plots to Plotly!

In [None]:
maruhaplot = candlestick_chart(maruha, title="Maruha")
maruhaplot.show()

After analyzing the two plots from Kyokuyo and Maruha, let's analyze the stock voltarity by using Seaborn.

Again, we define the variables kyokuyo and maruha again, but with the return data index to the kyokuyo variable (maruha variable) with the Close data index divided by itself again but with the shift function containing 1 that was enclosed with parentheses minus 1.

In [None]:
kyokuyo["Return"] = (kyokuyo["Close"]/kyokuyo["Close"].shift(1)) - 1
maruha["Return"] = (maruha["Close"]/maruha["Close"].shift(1)) - 1

Next, we use Seaborn to identify the stock volatility of data over Kyokuyo and Maruha corporations by using the sns module with the displot function, containing the kyokuyo variable, and setting the x parameter to Return.

In [None]:
sns.displot(kyokuyo, x="Return")

In [None]:
sns.displot(maruha, x="Return")

With our eyes keen to the two volatilities of Kyokuyo and Maruha Corporation, we see that the highest peak of it is in 0.00 in return. And after that, we finished analyzing the stock_prices_df dataframe! Now, let's move to the final analysis by analyzing the trades_df dataframe!

### **Chapter 6: trades**
For sure, we're now on the final chapter of our analysis of JPX! First of all, let's analyze the number of Sections on our trades_df dataframe by using Matplotlib's pie chart! 

We first define two variables: fig1 and ax to the plt module with the subplots function. Next, we define the ax variable again to the trades_df dataframe with the "Section" data index along with the value_counts function to count the values in a specific data index in a dataframe and then call out the plot function to start plotting our Matplotlib figure, containing 5 parameters: autopct to a %1.1f%% in strings to represent the percentage number, kind set to "pie" to know which type of plot we're plotting on, title set to "Sections" to name our title, figsize set to the numbers 15 and 10 grouped together in parentheses to determine the width and height of a graph, and fontsize set to 12 to determine the size of a text figure. We also use the plt module again with the axis function, containing off in strings so that we can never see the axis in our pie chart. Finally, we show our figure by using the plt module with the show function.

In [None]:
fig1, ax = plt.subplots()
ax = trades_df["Section"].value_counts().plot(autopct='%1.1f%%', kind='pie', title='Sections', figsize=(15, 10), fontsize=12)
plt.axis('off')
plt.show()

When you see this pie chart, all Prime Market (First Section), Standard Market (Second Section), and Growth Market (Mothers/JASDAQ) all have 33.3% on their data.

Now, let's see the percentage of the starting dates by using Plotly! We first define the variable, startdate to the trades_df dataframe with the StartDate data attribute along with the value_counts function (to count the values in the given data index in the dataframe), to_frame function (to create a dataframe), reset_index function (to reset all the indexes of the given dataframe), and the rename function (to rename the new dataframe), containing the columns parameter to a dictionary with the index and StartDate keys.

We then define another variable, fig to the px module with the pie function, setting up the startdate dataframe with the head function, containing 20 to show the first 20 rows of it, the values parameter to count, names parameter set to StartDate Counts, and title parameter set to StartDateCounts. Finally, we show our figure by using the fig variable to the show function.

In [None]:
startdate = trades_df.StartDate.value_counts().to_frame().reset_index().rename(columns = {'index':'StartDate', 'StartDate':'count'})

fig = px.pie(startdate.head(20), values='count', names='StartDate', title='StartDate Counts')
fig.show()

As you can see, we all see that the numbers of each starting date is 5% with counts of 3. Now, to wrap things up in this chapter, we've done all of Chapter 6 of our analysis over trades, since there's nothing much to do about it.

## Conclusion
With our analysis of six dataframes for this JPX competition, we've mastered the data of our needs to predict future market data in Tokyo. So, what else we can do? We can be like businessmen, or we can continue predicting stock exchanges. Well, it's our choice.