This project was created with the intention to create a data visualization that will let our marketing team explore the record of sales for my company. At the beginning of this course I had approached the Sales Manager Jeff with my idea to create a visualization that he could use to help drive his marketing efforts. After sitting down and discussing the project, we came to the conclusion that he would like to have a tool that would enable him to look at the record of sales for the company and easily understand the relative number of jobs sold in a certain time frame. I sketched out my original idea to include the full historic sales record in a panel along the bottom of the document and a main window that would show a smaller snapshot of the selected data. In order to construct this project I would have to obtain a copy of our sales record, clean the data to remove any inconsistent or incomplete data, and create an altair chart to display this data. 

Obtaining the sales record proved easy, as I was able to pull all of the information I needed with a database query to our crm. However the data did not have a consistent input style for many of the fields and I spent roughly 5 hours of my initial construction cleaning the data. I used R to accomplish this task and once the data was in a more useable state I exported the information into a csv. This csv was then uploaded into this Jupyter notebook and the final cleaning step were preformed as seen in the code below.

In [2]:
# Import dependencies
import pandas as pd
import altair as alt
import numpy as np


In [3]:
data = pd.read_csv("Geolocated Data - Sales & Prospect - 5.11.2023.csv", low_memory=False)

# Drop all features that are common between the prospect and sales dataframes
df = data.drop(['id','FullAddress', 'StreetName', 'JobStatus', 'SalesRepName1', 'SalesRepName2', 'Accuracy.Score', 'Accuracy.Type'], axis = 1)

# Prospect is defined by not having a contract date
prospect = df[df.ContractDate.isnull()]
sales = df.dropna(subset = ['ContractDate'])

# Drop unused features
sales = sales.drop(['DateAdded', 'Issued', 'Sat'], axis = 1)
prospect = prospect.drop(['GrossAmount', 'ContractDate','Source'], axis = 1)

# Drop rows with incomplete data
prospect = prospect.dropna()

# Remove zip codes that are not 5 digits (certain entries were throwing exceptions when converting to int)
prospect = prospect[prospect['Zip'].str.contains(r'^\d{5}$')]

sales = sales.where(sales['GrossAmount'] > 0, np.nan)

sales = sales.where(sales['State'] == 'NM', np.nan)
sales = sales.dropna()
# Adjust the type of various columns
sales = sales.astype({'productid':"str",
                      'City':'str',
                      'State':'str',
                      'Source':'str',
                      'SubSource':'str',
                      'Zip':'int'})

prospect = prospect.astype({'productid':'str',
                            'City':'str',
                            'State':'str',
                            'SubSource':'str'})
                            #'Zip':'int'})

# Set date field to use the datetime type
sales['ContractDate'] = pd.to_datetime(df.ContractDate)
#prospect['DateAdded'] = pd.to_datetime(df.DateAdded)

# Adjust the labels for sunroom products
sales = sales.replace({'SR-10':'Sunroom',
                    'SR-11':'Sunroom',
                    'SR-12':'Sunroom',
                    'SR-13':'Sunroom',
                    'SR-16':'Sunroom',
                    'SR-19':'Sunroom',
                    'SR-20':'Sunroom',
                    'SR-21':'Sunroom',
                    'SR-22':'Sunroom',
                    'SR-23':'Sunroom',
                    'SR-24':'Sunroom',
                    'SR-25':'Sunroom',
                    'SR-3':'Sunroom',
                    'SR-4':'Sunroom',
                    'SR-5':'Sunroom',
                    'SR-6':'Sunroom',
                    'SR-9':'Sunroom',
                    'Sun':'Sunroom',
                    'PC':'Patio cover',
                    'Win':'Window',
                    'Sid': 'Other',
                    'Stucco':'Other',
                    'Roof':'Other',
                    'SF':'Other',
                    'Deck':'Other'
})


With the data cleaned I was able to conduct a preliminary visualization using Altair. Initially my visualization was simple and consisted of only the scatter plot and histogram of our sales data. This was done to reduce the number of features in the prototyping stage, so that I could more rapidly prototype the product. With the prototype completed I returned to Jeff in order to review the project and determine if it was sufficient for his needs. As we were reviewing the project Jeff mentioned that he was starting a new advertisement campaign consisting of mailing flyers to various houses in our city. I suggest we add a visual element to the existing project that would display the geographic location of our previous sales overlaid on a map of New Mexico. After adding the new feature I was left with the following section of code.

In [4]:
alt.data_transformers.disable_max_rows()

interval = alt.selection_interval()

base = alt.Chart(sales).mark_point().encode(
    x = 'ContractDate:T',
    y = alt.Y('GrossAmount',axis = alt.Axis(title='Contract Price')),
    color = alt.Color('productid', legend=None)
)

chart = base.encode(
    x = alt.X('ContractDate:T', scale = alt.Scale(domain = interval)),
    tooltip = ['ContractDate', 'GrossAmount','productid', 'Zip']
).properties(
    width = 1000,
    height = 600,
    title = 'Historic Sales Record', 
).transform_filter(
    interval
)

hist = alt.Chart(sales).mark_bar().encode(
    y=alt.Y('count()', axis = alt.Axis(title=None)),
    x= alt.X('productid', axis = alt.Axis(title=None, labelAngle = -45)),
    color = 'productid',
    tooltip = ['count()', 'sum(GrossAmount)']
).properties(
    width=100,
    height=600,
    title = 'Projects sold'
).transform_filter(
    interval
)

view = alt.Chart(sales).mark_bar(size=1).encode(
    x = alt.X('ContractDate:T',
        axis = alt.Axis(title = 'Year')),
    y = alt.Y('sum(GrossAmount)', axis = alt.Axis(title = 'Total sales')),
    color = alt.Color('productid')
).add_params(
    interval
).properties(
    width = 1700,
    height = 150,
    title = "Click and drag to select a time frame. Double click to view full sales history."
)

url = 'https://raw.githubusercontent.com/deldersveld/topojson/master/countries/us-states/NM-35-new-mexico-counties.json'
data_map = alt.topo_feature(url, "cb_2015_new_mexico_county_20m")

map = alt.Chart(data_map).mark_geoshape(
    fill='lightgray',
    stroke='white'
).project('mercator').properties(
    width=500,
    height=600,
    title = 'Geographic location of sales'
)

geo = alt.Chart(sales).mark_circle().encode(
    longitude= 'Longitude:Q',
    latitude= 'Latitude:Q',
    size= alt.value(10),
    color = 'productid'
).transform_filter(
    interval
)

#(chart | legend) & view & hist & (base + geo)
(chart | hist | (map + geo)) & (view) 


The final visualization consists of a scatter plot that charts the contract price of all sales over a selected time frame, a histogram that shows the number of project sold in the selected time frame, a composite map that graphs the geographic location of all sales in teh selection, and finally the histogram along the bottom show the sum of all jobs sold on a particular day. The histogram along the bottom of the visualization can be used to select an interval of time, and this selection will dynamically change the other charts in the visualization. The pictures below offer a glimpse of how these features work in the working model.

The image below is showing the tooltip function of the scatter plot graph that allows the user to view the date a project was sold, the price it was sold for, the type of project, and the zip code for the job site.

In [7]:
from IPython.display import Image
Image(url="Screenshot (973).png")

This screenshot shows the tooltip behavior for the histogram of sales portion of the visualization. Here you can see the total number of jobs in the selection sorted by product category, as well as the total gross revenue for each product category.

In [8]:
Image(url="Screenshot (974).png")

The screenshot below shows the interval selection feature of the visualization. By clicking and dragging over a range of data points on the bottom section of the visualization you can dynamically change to range of data displayed. You can scroll the mouse wheel to expand or contract the interval selected and drag the selection to different time periods.

In [9]:
Image(url="Screenshot (975).png")

After showing the final product to Jeff we brought a laptop loaded with my model to our weekly department meeting and showed the model to the owner of the company and the Head of production, in a way to preform a kind of final evaluation. Overall the data displayed was well received, but there were a few interactions that were not intuitive for the users in my trial. The largest complaint was that there was not a way to filter the data other than selecting an interval. Both our head of production and head of sales suggested altering the model to allow for selection of a single product, as well as implementing some sort of date entry system to update the beginning and ending of the selection interval. Another area of critique was that the map section of the visualization was interesting but the scale of the image did not allow for easy interpretation of the data. Our owner suggesting adding the ability to zoom into different cities, or filter the data to only show certain cities. 

Overall the model proved to be a success in terms of the initial implementation, but there are several changes I would like to make on my own time. I think adding several filters to allow the selection of only a specific product, time range, or city are great additions, and I plan to implement these feature in the future. Another area that needs some improvement is the actual deployment of the model. Currently my model is only running locally, but hosting the model (as was recommended in the course) would allow more people to use the model as an actual tool in their day to day life. Another feature that may be interesting to add is an alteration to the map which should display the density of jobs sold as the color of point used. I have seen other implementations of this type of system and I believe that it would allow us to better pinpoint locations in the state for which we would like to advertize in the future. Finally Jeff, our sales manager, asked if I could create a similar visualization that would allow him to see the areas of our state that we have recently sent advertising to. This future project would be a good way to practice the skills that I have learned from this course, as well as improve my ability to implement visual models of real world data.