<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>


# LinkedIn - Linkedin Follow number of content published
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/LinkedIn/Linkedin_Follow_number_of_content_published.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/open_in_naas.svg"/></a>

**Tags:** #linkedin #html #plotly #csv #image #content #analytics #automation

**Author:** [Sanjeet Attili](https://www.linkedin.com/in/sanjeet-attili-760bab190/)

With this notebook, you can see the evolution the number of content published on LinkedIn on your personal profile, cumulated, daily since first activity.

## Input

### Import libraries

In [None]:
from naas_drivers import linkedin
import naas
import pandas as pd
from datetime import datetime
import plotly.graph_objects as go

### Setup LinkedIn
- [Get your cookies](/d20a8e7e508e42af8a5b52e33f3dba75)

In [None]:
# Lindekin cookies
LI_AT = "AQEDARCNSioDe6wmAAABfqF-HR4AAAF-xYqhHlYAtSu7EZZEpFer0UZF-GLuz2DNSz4asOOyCRxPGFjenv37irMObYYgxxxxxxx"
JSESSIONID = "ajax:12XXXXXXXXXXXXXXXXX"

# Enter profile URL
PROFILE_URL = "PROFILE_URL"

### Setup Outputs

In [None]:
# Outputs
name_output = "LinkedIn_Total_content_published"
csv_output = f"{name_output}.csv"
html_output = f"{name_output}.html"
image_output = f"{name_output}.png"

### Setup Naas

In [None]:
# Schedule your notebook everyday at 9:00 AM
naas.scheduler.add(cron="0 9 * * *")

#-> Uncomment the line below to remove your scheduler
# naas.scheduler.delete()

## Model

### Get posts feed

In [None]:
# Get posts feed from CSV stored in your local (Returns empty if CSV does not exist)
def get_past_feeds(csv_output):
    try:
        df = pd.read_csv(csv_output)
    except FileNotFoundError as e:
        # Empty dataframe returned
        return pd.DataFrame()
    return df

df_posts_feed = get_past_feeds(csv_output)
df_posts_feed

### Get new posts

In [None]:
def get_posts(df):
    # Get last post URL in dataframe
    if len(df) == 0:
        last_post_url = None
    else:
        last_post_url = df.POST_URL[0]
    # Get new posts since last url (this part is important to optimize script performance)
    until = {}
    if last_post_url:
        until = {"POST_URL": last_post_url}

    df_posts_feed = linkedin.connect(LI_AT, JSESSIONID).profile.get_posts_feed(PROFILE_URL, until=until, limit=-1)

    # Merge dataframe
    merge_df = df.append(df_posts_feed, ignore_index=False)
    merge_df.sort_values(by = 'PUBLISHED_DATE', ascending = False, inplace=True)
    
    # Keeps/updates the latest views count value for that day
    merge_df.drop_duplicates('POST_URL', keep = 'last', inplace=True)
    
    return merge_df

merged_df = get_posts(df_posts_feed)
merged_df

### Get trend

In [None]:
# Create dataframe with number of LinkedIn views cumulated by date with daily variation
# -> Empty date must be fullfiled with last value

def get_trend(posts_df):
    
    # taking activity_id instead of content_id as the
    # content_id seems to be None for some type of content.
    
    df = posts_df.copy()
    date_col_name='PUBLISHED_DATE'
    value_col_name="ACTIVITY_ID"
    date_order='asc'
    
    # Format date
    for idx, item in enumerate(df['PUBLISHED_DATE']):
        df.loc[idx, 'PUBLISHED_DATE'] = item.split('+')[0]
    
    df[date_col_name] = pd.to_datetime(df[date_col_name]).dt.strftime("%Y-%m-%d")
    df = df.groupby(date_col_name, as_index=False).agg({value_col_name: "count"})
    d = datetime.now().date()
    d2 = df.loc[df.index[0], date_col_name]
    idx = pd.date_range(d2, d, freq = "D")
    
    df.set_index(date_col_name, drop=True, inplace=True)
    df.index = pd.DatetimeIndex(df.index)
    df = df.reindex(idx, fill_value=0)
    df[date_col_name] = pd.DatetimeIndex(df.index)
    
    # Calc sum cum
    df["value_cum"] = df.agg({value_col_name: "cumsum"})
    df.drop(columns='ACTIVITY_ID', inplace=True)
    return df.reset_index(drop=True)

df_trend = get_trend(merged_df)
df_trend

## Output


### Display linechart

In [None]:
def create_linechart(df, label, value, title):
    # Init
    fig = go.Figure()
    
    # Create fig
    fig.add_trace(
        go.Scatter(
            x=df[label],
            y=df[value],
            mode="lines",
        )
    )
    fig.update_traces(marker_color='black')
    fig.update_layout(
        title=title,
        title_font=dict(family="Arial", size=18, color="black"),
        plot_bgcolor="#ffffff",
        width=1200,
        height=800,
        paper_bgcolor="white",
        margin_pad=10,
    )
    fig.show()
    return fig

fig = create_linechart(df_trend, label="PUBLISHED_DATE", value="value_cum", title='Total Content Published')

### Save and share your csv file

In [None]:
# Save your dataframe in CSV
merged_df.to_csv(csv_output, index=False)

# Share output with naas
naas.asset.add(csv_output)

#-> Uncomment the line below to remove your asset
# naas.asset.delete(csv_output)

### Save and share your graph in HTML

In [None]:
# Save your graph in HTML
fig.write_html(html_output)

# Share output with naas
naas.asset.add(html_output, params={"inline": True})

#-> Uncomment the line below to remove your asset
# naas.asset.delete(html_output)

### Save and share your graph in image

In [None]:
# Save your graph in PNG
fig.write_image(image_output)

# Share output with naas
naas.asset.add(image_output, params={"inline": True})

#-> Uncomment the line below to remove your asset
# naas.asset.delete(image_output)