# Introduction to Plotly library for charts

Plotly is a company offering several products. Some of them are paid online data analytics products. Some of them are free open source products. Plotly has Python interface - you can create interactive charts by writing Python code. The resulting charts can be seen directly in the Jupyter notebook.

It is advised to read Plotly Python + Jupyter tutorial: https://plot.ly/python/ipython-notebook-tutorial/ .

Plotly is a chart library built on top of the very flexible [D3.js](https://d3js.org/) Javascript chart library. [D3.js](https://d3js.org/) offers effectively unlimited options for charts, your imagination is the limit. If you need soemthing more flexible than Plotly, look into [D3.js library](https://d3js.org/). Warning: D3.js is very flexible, yet quite complicated at first. You will need a couple of days to get the feeling how it works.

## Step 0: Install Plotly

Plotly library is not included in Python (or Pandas) by default. You have to install it. To do that, open the Anaconda Prompt (On windows: search "Anaconda", run it), and run the following command:

```pip install plotly```

It will take a while. This will install the Plotly library in the python environment. Note: there may be several python environments on your machine, in case you have installed not just the Anaconda package, but something additional. Always when you install something, make sure you install in the correct python environment.

## Step 1: Imports and setup

In [1]:
# Import plotly chart types
import plotly.graph_objs as go

In [2]:
# We will want to work offline, without Plotly cloud services
import plotly.offline as ploff
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

In [3]:
# This tells Jupyter notebook to download all necessary Javascripts from Plotly so that 
# it can continue to work offline from now on
init_notebook_mode(connected=True)

For this example we also need to import Pandas, although it is not strictly required for Plotly.

In [4]:
import pandas as pd

## Step 2: Prepare data

In [5]:
# We read data from a CSV file
houses = pd.read_csv("real_estate.csv")

## Step 3: Prepare the chart elements

Each figure chart in plotly can show several chart elements. For example, you can combine a bar chart (one element) with a line chart (another element). In many cases you will need just one element. 

In this example we create a scatter plot. Area of real estate will be on the x axis - our independent variable. Price will be on the y axis - the dependent variable.

In [6]:
scatter = go.Scatter(
    x=houses["sq__ft"],
    y=houses["price"],
    mode="markers")

The figure will contain only that scatter plot.

In [7]:
figure_elements = [scatter]

Now we plot the figure.

In [8]:
iplot(figure_elements)

## Cleaned version (a side step)

We see that many houses have zero square feet (or empty value) but valid price. 
Let's filter these out, leave only valid data. This is actually not related to Plotly anyhow, it is a bug in our data.

In [9]:
valid_area = houses["sq__ft"] > 0
valid_houses = houses[valid_area]
print("Total houses in the dataset: %i, houses with valid area: %i" % (len(houses), len(valid_houses)))

Total houses in the dataset: 985, houses with valid area: 814


Now in the same manner we create another scatter plot.

In [10]:
scatter2 = go.Scatter(
    x=valid_houses["sq__ft"],
    y=valid_houses["price"],
    mode="markers"
)
fig_elems2 = [scatter2]
iplot(fig_elems2)

## Other chart types

There are many different chart types in Plotly. Check out the list of [basic chart types](https://plot.ly/python/basic-charts/). You can find many more chart types in the [Plotly Python start page](https://plot.ly/python/).

Let's count how many properties of each type are in the dataset, and visualize it as a bar chart.

The `groupby()` with `count()` syntax is a bit weird, it returns the same value in all the columns. We then need to take a single column (we use the "city" column, but could take any other).

We will get pandas.Series as a result. To get the house types we will need to access `.index` property of the resulting Series. To get the number of houses, we access `.values` property of the Series. In short: we show the house types on x axis and number of houses on the y axis.

In [11]:
type_grouping = valid_houses.groupby("type")
house_types = type_grouping.count()["city"]
bar_chart = go.Bar(x=house_types.index, y=house_types.values)
fig_elems3 = [bar_chart]
iplot(fig_elems3)

## Statistical charts

Plotly has also different [statistics charts](https://plot.ly/python/statistical-charts/) for attributes: boxplots, etc.

Let's make a boxplot showing the property prices

In [12]:
box_chart = go.Box(y=valid_houses.price, name="Prices")
figure_elements = [box_chart]
iplot(figure_elements)

## Multiple charts in a single plot

Sometimes you want to show multiple charts in a single plot. For example, series of bar charts, or bar chart and line chart.

To do that, just create several chart elements and combine them in a single figure.

We will create imaginary "Average price" line chart and combine it with the previous bar chart. 

Mini challenge for you: calculate the real average price for each house type, using `.groupby()` and `.mean()`.

In [13]:
avg_prices = [100, 120, 95]
# It is weird, but the simple Line chart is called "Scatter" in Plotly, just a different scatter mode
line_chart = go.Scatter(x=house_types.index, y=avg_prices) 

iplot([bar_chart, line_chart])

## Formatting the figure

Now we want to add some style to the figure. To control the style, we need to do one more step. Before plotting the chart elements directly, we combine them in a Figure object. The Figure object allows us to set some [layout options](https://plot.ly/python/#layout-options). 

Let's name the series correctly and add a title.

In [14]:
figure_layout = go.Layout(
    title='House statistics by type',
    xaxis=dict(
        title='Property type',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    ),
    yaxis=dict(
        title='Number of houses',
        titlefont=dict(
            size=12,
            color='#7f7f7f'
        )
    )
)

# Recreate the charts with correct legend names
line_chart_with_legend = go.Scatter(x=house_types.index, y=avg_prices, name="Average price") 
bar_chart_with_legend = go.Bar(x=house_types.index, y=house_types.values, name="Number of houses")
# we could also simply set the name for existing chart elements:
# bar_chart.name = "Number of houses"

figure_elements = [bar_chart_with_legend, line_chart_with_legend]
figure = go.Figure(data=figure_elements, layout=figure_layout)
iplot(figure)

See other options for legend layout in [Plotly documentation](https://plot.ly/python/legend/).

Now we resize the area of the chart. Some of the options are made directly in the chart element. Some of them are made in the layout.

In [15]:
figure_layout.width = 600 # Width for the whole figure
bar_chart_with_legend.width = [0.7, 0.5, 0.3] # width for bars

figure = go.Figure(data=figure_elements, layout=figure_layout)
iplot(figure)

## Add second axis

In [16]:
# First we define the second axis
figure_layout.yaxis2 = dict(
    title='Price, USD',
    overlaying='y',
    side='right',
    showgrid = False, # hide second gridline
    range=[75, 150]
)
# Then we specify in the line chart, that secondary axis should be used
line_chart_with_legend.yaxis = "y2"

# Disable gridlines and legend to reduce noise
figure_layout.yaxis.showgrid = False
figure_layout.showlegend = False

figure_layout.width = None # Disable the width that we set before

figure_elements = [bar_chart_with_legend, line_chart_with_legend]
figure = go.Figure(data=figure_elements, layout=figure_layout)
iplot(figure)

## Exporting a report

You can export the whole notebook as an HTML file. Some of the Javascritp libraries will be downloaded from the cloud still. I.e., without internet connection, the reader will not be able to view the charts. But the data will be kept in the notebook.

## 100% Offline charts

If you want to create an offline HTML chart that do not require an Internet connection, that is possible as well:

In [17]:
ploff.offline.plot(fig_elems2, filename="my-cool-plot.html")

'my-cool-plot.html'

The resulting chart will be saved to a self-contained HTML file.