# LP DAAC GitHub Resources Metrics

This notebook summarizes and visualizes metrics from the NASA Land Processes Distributed Active Archive Center (LP DAAC) for GitHub hosted Data Resources. These resources include text-based guides explaining how to work with NASA tools and services, and tutorials and scripts written to facilite use of data. Resources are categorized based on product or tool, across 7 repositories.

| Resource Repository | Summary | Services and Tools |
|----|-----|----|
|[LP DAAC Data Resources](https://github.com/nasa/LPDAAC-Data-Resources) |Generalized examples for finding accessing and working with data archived by LPDAAC |Tutorials, AppEEARS API, Direct S3 Access |
|[AppEEARS Data Resources](https://github.com/nasa/AppEEARS-Data-Resources) |How to use the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS) |Tutorials, AppEEARS API, Direct S3 Access |
|[ECOSTRESS Data](https://github.com/nasa/ECOSTRESS-Data-Resources)|How to find, access, and work with ECOSTRESS data (The ECOsystem Spaceborne Thermal Radiometer Experiment on Space Station)|Tutorials, Scripts, Direct S3 Access|
|[EMIT Data](https://github.com/nasa/EMIT-Data-Resources) |How to find, access, and work with EMIT data (Earth Surface Mineral Dust Source Investigation)|Tutorials, Scripts, Direct S3 Access |
|[GEDI Data](https://github.com/nasa/GEDI-Data-Resources) |How to find, access, and work with GEDI data (Global Ecosystem Dynamics Investigation)|Tutorials |
|[HLS Data](https://github.com/nasa/HLS-Data-Resources)|How to find, access, and work with HLS data (Harmonized Landsat Sentinel-2)|Tutorials, Scripts, Direct S3 Access|
|[VITALS](https://github.com/nasa/VITALS)|How to find and work with EMIT and ECOSTRESS data together |Tutorials|

GitHub Traffic Metrics capture Views, Unique Views, Clones, Unique Clones, Referrers and Popular repository content. These contain the most recent 14 day period. We capture these metrics every 14 days. Views, Unique Views, Clones and Unique Clones have a single date associated with them. Referrers and popular content do not have a date associated and are just aggregated over the previous 14-day window. We capture these metrics every 14 days, giving daily insight into views and clones, but only a 14-day summary of referrers and popular content. 

In [None]:
import os
import pandas as pd
import hvplot.pandas
import panel as pn
import holoviews as hv
from bokeh.palettes import Colorblind as colormap
hv.extension('bokeh')

In [None]:
# Open Data
traffic = pd.read_csv("traffic_metrics.csv", parse_dates=['date'])
paths = pd.read_csv("popular_content.csv", parse_dates=['date'])
refs = pd.read_csv("referrers.csv", parse_dates=['date'])

In [None]:
# View Traffic
def view_traffic(df, freq: str = "month"):
    """
    df:     traffic dataframe
    freq:   "month" to see daily
            "year" to see aggregated monthly
    """
    # Update Data
    df = df.copy()
    df["year"]  = df["date"].dt.year.astype(str)
    df["month"] = df["date"].dt.to_period("M").astype(str)
       
    options = sorted(df[freq].unique())
    widget = pn.widgets.Select(name=freq.title(),options=options, value=options[-1])
    metric_widget = pn.widgets.Select(name="Metric", options=["views", "unique views", "clones","unique clones"], value="views")

    area_opts = hv.opts.Area(hover_tooltips=[("Repo",  "$name"), ("Date",  "@x{%F}"),("Value", "@y")],hover_formatters={'@x': 'datetime'})

    @pn.depends(sel=widget, metric=metric_widget)
    def _plot(sel, metric):
        if freq == "month":
            data = df[df["month"] == sel]
            pivot = (
                data.pivot_table(index="date",columns="repository",values=metric,fill_value=0)
                .sort_index(axis=1)
            )
            return pivot.hvplot.area(
                stacked=True,
                width=750,
                height=375,
                color=list(colormap[len(pivot.columns)]),
                title=f'{metric.title()} in {sel}',
                ylabel=metric.title(),
            ).opts(area_opts)
        else:
            data = df[df["year"] == sel]
            pivot = (
                data
                .groupby(["month","repository"])[metric]
                .sum()
                .unstack(fill_value=0)
                .sort_index(axis=1)
            )
            return pivot.hvplot.area(
                stacked=True,
                width=750,
                height=375,
                color=list(colormap[len(pivot.columns)]),
                title=f"Monthly {metric.title()} in {sel}",
                xlabel="Month",
                ylabel=metric.title(),
            ).opts(area_opts)
    return pn.Row(_plot, pn.Column(widget, metric_widget))

## Monthly Traffic Visual

In [None]:
monthly_figure = view_traffic(traffic, 'month')
monthly_figure

In [None]:
#pn.panel(monthly_figure).save('monthly_traffic.html',embed=True)

# Annual Traffic Visual

In [None]:
annual_figure = view_traffic(traffic, 'year')
annual_figure

In [None]:
#pn.panel(monthly_figure).save('annual_traffic.html',embed=True)

## Monthly Popular Resource Paths
Note that these numbers are not exact monthly counts, they are 28 days of data collected at the end of two 2-week periods where the end date fell within the month.

In [None]:
# Monthly Popular Resource Paths
paths['month'] = paths['date'].dt.to_period('M')
paths['ext'] = paths['path'].apply(lambda p: os.path.splitext(os.path.basename(p))[1])
filtered_paths = paths[paths['ext'] != ''].copy()
monthly_paths = (filtered_paths.groupby(['month','path']).agg(total_count=('count','sum'),total_uniques = ('uniques','sum')).reset_index())
pd.set_option('display.max_colwidth', None)
monthly_paths.sort_values(['month','total_uniques'],ascending=[True,False]).groupby('month').head(10).reset_index(drop=True)

In [None]:
# Annual Popular Resources Paths
# TODO

## Monthly Top Referrers
Note that these numbers are not exact monthly counts, they are 28 days of data collected at the end of two 2-week periods where the end date fell within the month.

In [None]:
# Monthly Top Referrers
refs['month'] = refs['date'].dt.to_period('M')
monthly_refs = (refs.groupby(['month','referrer'])['count'].sum().reset_index())
monthly_refs.sort_values(['month','count'],ascending=[True,False]).groupby('month').head(10).reset_index(drop=True)

In [None]:
# Annual Top Referrers
# TODO

## Contact Info  

Email: <LPDAAC@usgs.gov>  
Voice: +1-866-573-3222  
Organization: Land Processes Distributed Active Archive Center (LP DAAC)¹  
Website: <https://www.earthdata.nasa.gov/centers/lp-daac>  

¹Work performed under USGS contract 140G0121D0001 for NASA contract NNG14HH33I. 