<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# HubSpot - Create sales dataset

**Tags:** #hubspot #crm #sales #deal #naas_drivers #notification #asset #emailbuilder #scheduler #naas #analytics #automation #email #text #plotly #html #image

**Author:** [Florent Ravenel](https://www.linkedin.com/in/florent-ravenel/)

**Last update:** 2023-04-12 (Created: 2022-02-21)

**Description:** This notebook send a sales brief based on your HubSpot activity.

## Input

### Import libraries

In [None]:
import naas 
from naas_drivers import hubspot
import os
import pandas as pd
from datetime import datetime

### Setup variables

In [None]:
# Enter Your Access Token
hs_access_token = naas.secret.get("HS_ACCESS_TOKEN") or "YOUR_HS_ACCESS_TOKEN"
pipeline_id = "8432671"
properties = [
    "hs_object_id",
    "dealname",
    "dealstage",
    "pipeline",
    "createdate",
    "hs_lastmodifieddate",
    "closedate",
    "amount",
]
output_file_path = "/home/ftp/__abi__/outputs/by_tools/hubspot/df_deals.csv"

## Model

### Get all pipelines and dealstages

In [None]:
df_pipelines = hubspot.connect(hs_access_token).pipelines.get_all()
print("Rows:", len(df_pipelines))
df_pipelines.head(1)

### Filter on pipeline ID to get dealstages

In [None]:
df_dealstages = df_pipelines.copy()
# Filter on pipeline
df_dealstages = df_dealstages[df_dealstages.pipeline_id == pipeline_id]

print("Rows:", len(df_dealstages))
df_dealstages

### Get deals from pipeline

In [None]:
df_deals = hubspot.connect(hs_access_token).deals.get_all(properties)

# Filter on pipeline
df_deals = df_deals[df_deals.pipeline == pipeline_id].reset_index(drop=True)

print("Rows:", len(df_deals))
df_deals

### Prep deal stages dataset

In [None]:
df_dealstages_c = df_dealstages.copy()
to_drop = [
    "createdAt",
    "updatedAt",
    "archived",
    "dealclosed"
]
df_dealstages_c = df_dealstages_c.drop(to_drop, axis=1)
df_dealstages_c

### Create sales pipeline database

In [None]:
df_sales = pd.merge(
    df_deals,
    df_dealstages_c,
    left_on="dealstage",
    right_on="dealstage_id",
    how="left",
)
print("Rows:", len(df_deals))
df_sales

### Cleaning database

In [None]:
df_sales_c = df_sales.copy()

to_order = [
    "createdate",
    "dealname",
    "amount",
    "closedate",
    "dealstage_label",
    "displayOrder",
    "probability",
    "hs_lastmodifieddate",
    "hs_object_id"
]
df_sales_c = df_sales_c[to_order]

# Cleaning
df_sales_c["amount"] = df_sales_c["amount"].fillna("0")
df_sales_c.loc[df_sales_c["amount"] == "", "amount"] = "0"

# Formatting
df_sales_c["amount"] = df_sales_c["amount"].astype(float)
df_sales_c["probability"] = df_sales_c["probability"].astype(float)
df_sales_c.createdate = pd.to_datetime(df_sales_c.createdate)
df_sales_c.hs_lastmodifieddate = pd.to_datetime(df_sales_c.hs_lastmodifieddate)
df_sales_c.closedate = pd.to_datetime(df_sales_c.closedate)

# Calc
df_sales_c["forecasted"] = df_sales_c["amount"] * df_sales_c["probability"]

print("Rows:", len(df_sales_c))
df_sales_c

## Output

### Save dataframe to csv

In [None]:
df_sales_c.to_csv(output_file_path, index=False)