---
format: 
  html:
    toc: true
execute:
  echo: true
---

# Advanced Visualization

## Interactive word cloud diagram

In [None]:
#| echo: true 
#| code-fold: true
from wordcloud import WordCloud, STOPWORDS
import plotly.graph_objs as go
py.init_notebook_mode(connected=True)
text = " ".join(str(each) for each in df.text)
wc = WordCloud(stopwords = set(STOPWORDS),
               max_words = 200,
               max_font_size = 100)
wc.generate(text)
word_list=[]
freq_list=[]
fontsize_list=[]
position_list=[]
orientation_list=[]
color_list=[]
for (word, freq), fontsize, position, orientation, color in wc.layout_:
    word_list.append(word)
    freq_list.append(freq)
    fontsize_list.append(fontsize)
    position_list.append(position)
    orientation_list.append(orientation)
    color_list.append(color)
# get the positions
x=[]
y=[]
for i in position_list:
    x.append(i[0])
    y.append(i[1])
# get the relative occurence frequencies
new_freq_list = []
for i in freq_list:
    new_freq_list.append(i*100)
new_freq_list
trace = go.Scatter(x=x, 
                   y=y, 
                   textfont = dict(size=new_freq_list,
                                   color=color_list),
                   hoverinfo='text',
                   hovertext=['{0}{1}'.format(w, f) for w, f in zip(word_list, freq_list)],
                   mode='text',  
                   text=word_list
                  )
layout = go.Layout({'xaxis': {'showgrid': False, 'showticklabels': False, 'zeroline': False},
                    'yaxis': {'showgrid': False, 'showticklabels': False, 'zeroline': False}})

fig = go.Figure(data=[trace], layout=layout)
fig.update_layout(
    width=700,  
    height=700  
    )
py.iplot(fig)

For text fields in the data, they are presented through an interactive word cloud map, which displays a large amount of text data, thus allowing the reader to quickly grasp the point.

## Interactive Line Chart

In [None]:
#| echo: true 
#| code-fold: true
dfdate = df.groupby("lastSoldOn")["pricePerSqft"].mean().reset_index()
fig = go.Figure(data=[go.Scatter(x=dfdate['lastSoldOn'], y=dfdate['pricePerSqft'], mode='lines')])
fig.update_layout(title='Interactive Line Chart', xaxis_title='Date', yaxis_title='Value')
fig.show()

Interactive time series graphs are used to show the average daily house prices, and the graphs are interactive and easy for users to click and view directly. According to the results of the line graph, it can be seen that the house price has been in fluctuation.

## Correlation coefficient graph

In [17]:
#| echo: true 
#| code-fold: true
py.init_notebook_mode(connected=True)
dfc = df.loc[:,["beds","baths","baths_full","baths_half","garage","lot_sqft","pricePerSqft","stories"]]
corr = dfc.corr()
matrix_cols = corr.columns.tolist()
corr_array = np.array(corr)
trace = go.Heatmap(x=matrix_cols,
                  y=matrix_cols,
                  z=corr_array,
                  colorscale="Viridis",
                  colorbar=dict())
layout = go.Layout(dict(title="Correlation Matrix for variables"),
                  margin=dict(r = 0 ,
                              l = 100,
                              t = 0,
                              b = 100),
                   yaxis=dict(tickfont=dict(size = 9)),
                   xaxis=dict(tickfont=dict(size = 9)),
                  )
fig = go.Figure(data=[trace], layout=layout)
fig.update_layout(
    width=500,  #
    height=500  
)
py.iplot(fig)

An interactive correlation coefficient plot shows the relationship between the number of bedrooms, the total number of bathrooms, the number of fully furnished bathrooms, the number of half bathrooms, the number of garage spaces, the size of the parcel and the size of the property, and the price of the house. A correlation coefficient greater than 0 indicates a positive correlation between the two variables and a correlation coefficient less than 0 indicates a negative correlation between the two variables.

## Interactive scatterplot

In [18]:
#| echo: true 
#| code-fold: true
fig = px.scatter(
  df,x="pricePerSqft" 
  ,y="lot_sqft"  
  ,color="type"  
  ,size_max=60 
)
fig.update_layout(
    width=500,  #
    height=500  
)
py.iplot(fig)

The relationship between parcel size and house price per unit area is demonstrated through an interactive scatterplot, from which the scatterplot can be found to show a positive correlation between parcel size and house price per unit area.

## Interactive Box Diagram

In [19]:
#| echo: true 
#| code-fold: true
fig3 = px.box(df, x="type", y="pricePerSqft")
fig3.show()

The distribution of house price per unit area for different types of land parcels is demonstrated by box plots, which show the maximum value, minimum value, median and other information, and this figure shows that there are differences in the distribution of house price per unit area for different types of land parcels.

## Interactive bar chart

In [None]:
#| echo: true 
#| code-fold: true
df1 = df["type"].value_counts().reset_index()
df1.columns = ["type","Count"]
fig = px.bar(df1,
             x="Count",
             text="Count",
             orientation="h")
fig.show()

Demonstrate the distribution of the number of different parcels through an interactive bar chart.