## Goal: understanding contribution patterns

Analyzing groups of contributors, according to their activity patterns, and their evolution over time, helps to understand the structure of the community. These groups will be defined according to how much active they are (from casual to core contributors), and which kinds of activity they have (for example, producing code, reviewing code, submitting issues, contributing in discussions, etc.). Whenever convenient, the characterization will be combined with the contributor groups identified in the first goal.

This goal is refined in the following questions:

**Questions**:

 * How often do contributors contribute?
 * How is the structure of contribution, according to level of activity?
 * How is the structure of contribution, according to the different data sources?
 * How are the structures of contribution evolving over time?
 * How is people flowing in the structure of contribution?

These questions can be answered with the following metrics:

**Metrics**:

(Still to be refined)

 * Groups of contributors, by level of activity (core, regular, casual)
 * Groups of contributors, by kind of activity (committing, opening issues, merging pull requests, etc.)
 * Groups of contributors, by kind of activity (specialists, spread, etc.)
 * Activity metrics for each group
 * Absolute number of contributors moving from one group to another
 * Fraction of contributors moving from a group to another
 
Some of these metrics will be computed for the speficied contributor groups, over time.

# Metric Calculations
First we need to load a connection against the proper ES instance. We use an external module to load credentials from a file that will not be shared. If you want to run this, please use your own credentials, just put them in a file named '.settings' (in the same directory as this notebook) following the example file 'settings.sample'.

This section includes common code to manage and plot data. Queries will be available at the corresponding section.


In [60]:
import pandas

import os

import plotly as plotly
import plotly.graph_objs as go

import util as ut

from util import ESConnection
from elasticsearch_dsl import Search

es_conn = ESConnection()

# Let's load projects from the REVIEWED SPREADSHEET
projects = ut.read_projects("data/Contributors and Communities Analysis - Project grouping.xlsx")

project_name = os.environ.get('PROJECT', 'all')

date_range = {'gte': '1998-01-01', 'lt': 'now/y'}


In [68]:
def create_search(source):
    s = Search(using=es_conn, index=source)
    
    if source == 'git' or source == 'github':
        github = projects['Github']
        repos = github['Repo'].tolist()
        #print (repos)
        s = s.filter('terms', repo_name=repos)
    
    # TODO: Add bot and merges filtering.
    #s = s.filter('range', grimoire_creation_date={'gt': 'now/M-2y', 'lt': 'now/M'})
    return s

In [62]:
def print_df(result, group_field, value_field, group_column, value_column):
    df = pandas.DataFrame()

    df = df.from_dict(result.to_dict()['aggregations'][group_field]['buckets'])
    df = df.drop('doc_count', axis=1)
    df[value_field] = df[value_field].apply(lambda row: row['value'])
    df=df[['key', value_field]]
    df.columns = [group_column, value_column]

    return df

In [63]:
def stack_by(result, group_column, time_column, value_column, group_field, time_field, value_field):
    """Creates a dataframe based on group and time values
    """
    df = pandas.DataFrame(columns=[group_column, time_column, value_column])

    for b in result.to_dict()['aggregations'][group_field]['buckets']:
        for i in b[time_field]['buckets']:
            df.loc[len(df)] = [b['key'], i['key_as_string'], i[value_field]['value']]
    
    return df

def stack_by_2(result, group_column, time_column, value_column, group_field, time_field, value_field):
    """Creates a dataframe based on group and time values
    """
    df = pandas.DataFrame(columns=[time_column, group_column, value_column])

    for b in result.to_dict()['aggregations'][time_field]['buckets']:
        for i in b[group_field]['buckets']:
            df.loc[len(df)] = [b['key_as_string'], i['key'], i[value_field]['value']]
    
    return df

def stack_by_3(result, group_column, time_column, value_column, group_field, time_field, value_field):
    """Creates a dataframe based on group and time values
    """
    df = pandas.DataFrame(columns=[time_column, group_column, 'org', value_column])

    for b in result.to_dict()['aggregations'][time_field]['buckets']:
        for i in b[group_field]['buckets']:
            for org in i['org']['buckets']:
                df.loc[len(df)] = [b['key_as_string'], i['key'], org['key'], org[value_field]['value']]
    
    return df

In [64]:
def onion_label(df, value_column):
    
    total = df[value_column].sum()
    
    percent_80 = total * 0.8
    percent_95 = total * 0.95
    core_sum = 0
    regular_sum = 0
    
    labeled_df = pandas.DataFrame(
        columns=['uuid', 'time', 'org', 'onion'])
  
    for row in df.iterrows():
        value = row[1][value_column]
        uuid = row[1]['uuid']
        time = row[1]['time']
        org = row[1]['org']

        
        if (percent_80 > core_sum):
            core_sum = core_sum + value
            regular_sum = regular_sum + value
            
            onion_group = 'core'
                
        elif percent_95 > regular_sum:
            regular_sum = regular_sum + value
            
            onion_group = 'regular'
                
        else:
            onion_group = 'casual'
            
        labeled_df.loc[len(labeled_df)] = [uuid, time, org, onion_group]
                

    return labeled_df


def onion_labeled_evolution(df, bucket_field, time_field, metric_field):
    
    #print(len(df))
    
    onion_df = pandas.DataFrame(
        columns=['uuid', 'time', 'org', 'onion'])
    
    for time in df[time_field].unique():
        slice_df = df.loc[df['time'] == time]
        slice_df = slice_df.sort_values(by=metric_field, ascending=False)
        
        onion_labeled_result = onion_label(slice_df, value_column=metric_field)
        
        print(time, '->', len(slice_df['uuid'].unique()))#, slice_df.columns.values.tolist(), '->', onion_result)
        
        onion_df = pandas.concat([onion_df, onion_labeled_result]) 
        
    
    #print(len(df))
    return onion_df

In [65]:
def print_grouped_bar(df, x_column, value_columns, title):
    """
    """
    plotly.offline.init_notebook_mode(connected=True)

    bars = []
    x_values = df[x_column].tolist()
    for value_column in value_columns:
        bars.append(go.Bar(
            x=x_values,
            y=df[value_column].tolist(),
            name=value_column))

    layout = go.Layout(
        barmode='group',
        title= title
    )

    fig = go.Figure(data=bars, layout=layout)
    plotly.offline.iplot(fig, filename='grouped-bar')
    
def print_stacked_bar(df, x_column, value_columns, title):
    """
    """
    plotly.offline.init_notebook_mode(connected=True)

    bars = []
    x_values = df[x_column].tolist()
    for value_column in value_columns:
        bars.append(go.Bar(
            x=x_values,
            y=df[value_column].tolist(),
            name=value_column))

    layout = go.Layout(
        barmode='stack',
        title= title
    )

    fig = go.Figure(data=bars, layout=layout)
    plotly.offline.iplot(fig, filename='stacked-bar')

In [66]:
from elasticsearch_dsl import Q

def add_bot_filter(s):
    return s.filter('term', author_bot='false')

def add_merges_filter(s):
    return s.filter('range', files={'gt': 0})

def add_date_filter(s):
    q = Q('range')
    q.metadata__updated_on = {'gte': '1998-01-01', 'lt': 'now/y'}
    #q.metadata__updated_on = {'gte': '2010-01-01', 'lt': 'now/y'}

    return s.filter(q)

def add_general_date_filters(s):
    # 01/01/1998
    initial_ts = '883609200000'
    q = Q('range')
    q.metadata__updated_on={'gt': initial_ts}

    return s.filter(q)


def add_project_filter(s, project_name):

    if project_name.lower() != 'all':
        s = s.filter('term', project=project_name)

    return s

    # Let's load projects from the REVIEWED SPREADSHEET
    #projects = get_projects()
    # 
    # if project_name.lower() != 'all':
    #     github = projects['Github']
    #     repos = github[github['Project'] == project_name]['Repo'].tolist()
    #     #print(repos)
    #     s = s.filter('terms', repo_name=repos)
    # return s

# Let's load projects from the REVIEWED SPREADSHEET
#projects = ut.read_projects("data/Contributors and Communities Analysis - Project grouping.xlsx")


# Metrics

## Groups of contributors, by level of activity: core, regular, casual

Following table and chart shows number of contributors in three groups:
* Core: minimum number of authors who made 80% of contributions.
* Regular: minimum number of authors who made between 80% and 95% of contributions.
* Casual: the rest of contributors, who made the last 5% of contributions.

Looking at their evolution through time we can see the structure of a community at some point and its evolution.

In [69]:
s = create_search(source='git')

#s = add_general_date_filters(s)

s = add_bot_filter(s)
s = add_merges_filter(s)

# Filter commits to the Project Repos
s = add_project_filter(s, project_name)

# Adds date range to retrieve data from
s = add_date_filter(s)


# Unique count of Commits by Authors over time
s.aggs.bucket('time', 'date_histogram', field='metadata__updated_on', interval='quarter')\
    .bucket('uuid', 'terms', field='author_uuid', size=100000)\
    .bucket('org', 'terms', field='author_org_name', size=100000)\
    .metric('commits', 'cardinality', field='hash', precision_threshold=3000)

result = s.execute()

authors_df = stack_by_3(result, 'uuid', 'time', 'commits', 'uuid', 'time', 'commits')


# Divide authors in Employees and Non-Employees based on org name
authors_df.loc[authors_df['org'] == 'Mozilla Staff', 'org'] = 'Employees'
authors_df.loc[authors_df['org'] == 'Code Sheriff', 'org'] = 'Employees'
authors_df.loc[authors_df['org'] != 'Employees', 'org'] = 'Non-Employees'

In [70]:
onion_df = onion_labeled_evolution(authors_df, bucket_field='uuid', time_field='time', metric_field='commits')

# Calculate quarters
#onion_df['Quarter'] = pandas.PeriodIndex(pandas.to_datetime(onion_df.Time), freq='Q')
onion_df['Quarter'] = onion_df['time'].map(lambda x: str(pandas.Period(x,'Q')))

onion_df

1998-01-01T00:00:00.000Z -> 6
1998-04-01T00:00:00.000Z -> 94
1998-07-01T00:00:00.000Z -> 163
1998-10-01T00:00:00.000Z -> 112
1999-01-01T00:00:00.000Z -> 116
1999-04-01T00:00:00.000Z -> 134
1999-07-01T00:00:00.000Z -> 151
1999-10-01T00:00:00.000Z -> 146
2000-01-01T00:00:00.000Z -> 158
2000-04-01T00:00:00.000Z -> 178
2000-07-01T00:00:00.000Z -> 189
2000-10-01T00:00:00.000Z -> 160
2001-01-01T00:00:00.000Z -> 169
2001-04-01T00:00:00.000Z -> 194
2001-07-01T00:00:00.000Z -> 181
2001-10-01T00:00:00.000Z -> 186
2002-01-01T00:00:00.000Z -> 192
2002-04-01T00:00:00.000Z -> 191
2002-07-01T00:00:00.000Z -> 186
2002-10-01T00:00:00.000Z -> 172
2003-01-01T00:00:00.000Z -> 156
2003-04-01T00:00:00.000Z -> 141
2003-07-01T00:00:00.000Z -> 130
2003-10-01T00:00:00.000Z -> 101
2004-01-01T00:00:00.000Z -> 106
2004-04-01T00:00:00.000Z -> 106
2004-07-01T00:00:00.000Z -> 109
2004-10-01T00:00:00.000Z -> 107
2005-01-01T00:00:00.000Z -> 129
2005-04-01T00:00:00.000Z -> 144
2005-07-01T00:00:00.000Z -> 181
2005-10-01T

Unnamed: 0,uuid,time,org,onion,Quarter
0,360bda5d687dec2cadfb804f08c53e32c9a00a27,1998-01-01T00:00:00.000Z,Non-Employees,core,1998Q1
1,be4def1ae593ac3e9a3e6a7d3eb847e93bb21bb2,1998-01-01T00:00:00.000Z,Non-Employees,core,1998Q1
2,2a205325afe7a7ba7f2e44ab0b0f7a0dc5a9de00,1998-01-01T00:00:00.000Z,Non-Employees,core,1998Q1
3,c612c17daa0436547fdf8dde7038b13ddc77f0d9,1998-01-01T00:00:00.000Z,Non-Employees,core,1998Q1
4,d95d6e15648451cfc45885ee0de13b85f412403a,1998-01-01T00:00:00.000Z,Non-Employees,regular,1998Q1
5,fee82d83c32b57da8be70bb1a60fdd18945a4d89,1998-01-01T00:00:00.000Z,Non-Employees,regular,1998Q1
0,4782bfeacd4153cda87b5f1111afee53ac1187ee,1998-04-01T00:00:00.000Z,Non-Employees,core,1998Q2
1,431b3b9443363d840dc45885370dce1ebf0bb6df,1998-04-01T00:00:00.000Z,Non-Employees,core,1998Q2
2,b8a291c68e2361bc52b1fd6097bbf5387cc72433,1998-04-01T00:00:00.000Z,Non-Employees,core,1998Q2
3,8426e77b1adabfc17cf61da651a0194683856612,1998-04-01T00:00:00.000Z,Non-Employees,core,1998Q2


In [72]:
import csv

def parse_csv(filepath):
    with open(filepath) as csvfile:
        reader = csv.DictReader(csvfile, delimiter=',', quotechar='"')
        for row in reader:
            yield row

def read_uuids(uuids_filepath):
    uuid_set = set()
    for row in parse_csv(uuids_filepath):
        uuid = row['uuid']
        uuid_set.add(uuid)

    return uuid_set


uuid_set = read_uuids('tools/dates-uuids.csv')
print('Read UUIDs:', len(uuid_set))

def query_max_non(uuid, emp_groups):
    s = create_search(source='git')
    
    s = add_bot_filter(s)
    s = add_merges_filter(s)
    # Filter commits to the Project Repos
    s = add_project_filter(s, project_name)

    s = s.filter('term', author_uuid=uuid)
    s = s.exclude('term', project='Unknown')
    s = s.exclude('terms', author_org_name=emp_groups)
    
    # For some reason this filter must be last one to escape '__' correctly
    # Adds date range to retrieve data from
    s = add_date_filter(s)
    
    s = s.aggs.metric('max', 'max', field='metadata__updated_on')
    
    # don't return any fields, just the metadata
    s = s.source(False)
    
    #print('max:', s.to_dict())
    
    return s.execute()

def query_min_emp(uuid, emp_groups):
    s = create_search(source='git')
    
    s = add_bot_filter(s)
    s = add_merges_filter(s)
    # Filter commits to the Project Repos
    s = add_project_filter(s, project_name)

    s = s.filter('term', author_uuid=uuid)
    s = s.exclude('term', project='Unknown')
    s = s.filter('terms', author_org_name=emp_groups)
   
    # Adds date range to retrieve data from
    s = add_date_filter(s)

    s = s.aggs.metric('min', 'min', field='metadata__updated_on')
    
    # don't return any fields, just the metadata
    s = s.source(False)
    
    #print('min:', s.to_dict())
    
    return  s.execute()

def query_max_non_filtered(uuid, emp_groups, min_emp_str):
    s = create_search(source='git')
    
    s = add_bot_filter(s)
    s = add_merges_filter(s)
    # Filter commits to the Project Repos
    s = add_project_filter(s, project_name)


    s = s.filter('term', author_uuid=uuid)
    s = s.exclude('term', project='Unknown')
    s = s.exclude('terms', author_org_name=emp_groups)
    q = Q('range')
    q.metadata__updated_on = {'gte': '1998-01-01', 'lt': min_emp_str}
    s = s.filter(q)
    
    s = s.aggs.metric('max', 'max', field='metadata__updated_on')

    # don't return any fields, just the metadata
    s = s.source(False)
    
    #print('max filt:', s.to_dict())

    return s.execute()


hired_onion_df = pandas.DataFrame(columns=['uuid', 'time', 'org', 'onion'])

## Query non-employee max date and employee min date
##   - If non-emp max < emp min => take onion group for quarter corresponding to non-emp max
##   - IOC, query for non-emp max filtering results to get only those older than emp_min
##       - get onion for quarter corresponding to new non-emp max 

emp_groups = ['Mozilla Staff', 'Code Sheriff']
#date_range = {'gte': '2010-01-01', 'lt': 'now/y'}
count = 0
not_found = 0
employee_commit_first = 0
for uuid in uuid_set:
    count += 1
    if (count % 100 == 0):
        print(count)
    
    response = query_max_non(uuid, emp_groups)

    # If there aren't any commits as Non-Employee we don't have onion data 
    if response.hits.total == 0:
        #print('No commits found for', uuid, 'as non-employee')
        not_found += 1
        continue
        
    #print(uuid)
       
    max_nonemp = response.to_dict()['aggregations']['max']['value']
    max_nonemp_str = response.to_dict()['aggregations']['max']['value_as_string']
    
    #print('hits:', response.hits.total)
    #print(max_nonemp_str)
    
    response = query_min_emp(uuid, emp_groups)
    
    # If there are commits as Non-Employee (checked above) and Employee (checked below)
    if response.hits.total > 0:
        min_emp = response.to_dict()['aggregations']['min']['value']
        min_emp_str = response.to_dict()['aggregations']['min']['value_as_string']
        #print(max_nonemp_str, min_emp_str)
        if max_nonemp < min_emp:
            period_non = pandas.Period(max_nonemp_str,'Q')
            period_emp = pandas.Period(min_emp_str,'Q')
            
            # If both dates are from same quarter, get onion from previous quarter with commits
            # as non-employee
            if period_non == period_emp:
                available_periods = onion_df.loc[(onion_df['org'] == 'Non-Employees') &
                        (onion_df['uuid'] == uuid)]['Quarter'].unique()
                i = 1
                while True:
                    period = str(pandas.Period(max_nonemp_str,'Q')-i)
                    #print(str(period))
                    i += 1
                    if period in available_periods or period <= str(pandas.Period('1997Q4')):
                        break

            else:
                period = str(period_non)
                
            group = onion_df.loc[(onion_df['org'] == 'Non-Employees') &
                        (onion_df['uuid'] == uuid) &
                        (onion_df['Quarter'] == period)]['onion']
            if len(group) > 0:
                hired_onion_df.loc[len(hired_onion_df)] = [uuid, period, 'Non-Employees', group.values[0]]
                #print('1.', group.values[0])
            else:
                # When commits as non-employee were made before 1998 we don't have onion data for them
                hired_onion_df.loc[len(hired_onion_df)] = [uuid, period, 'Non-Employees', 'none-1']
                print('1.', period, uuid)
        
        # There are commits as non-employee after being employee, get last quarter before being employee
        # query for non-emp max filtering results to get only those older than emp_min
        ##       - get onion for quarter corresponding to new non-emp max 
        else:
            
            response = query_max_non_filtered(uuid, emp_groups, min_emp_str)
            
            if response.hits.total > 0:
                max_before = response.to_dict()['aggregations']['max']['value_as_string']
                period = str(pandas.Period(max_before,'Q'))
                
                group = onion_df.loc[(onion_df['org'] == 'Non-Employees') &
                        (onion_df['uuid'] == uuid) &
                        (onion_df['Quarter'] == period)]['onion']
                    
                if len(group) > 0:
                    hired_onion_df.loc[len(hired_onion_df)] = [uuid, period, 'Non-Employees', group.values[0]]
                else:
                    # When commits as non-employee were made before 2010 we don't have onion data for them
                    hired_onion_df.loc[len(hired_onion_df)] = [uuid, period, 'Non-Employees', 'none-2']
            else:
                ## No commits found before first employee commit
                employee_commit_first += 1 

    
    # If there are only commits as Non-Employee (checked above), take last onion group
    else:
        period = str(pandas.Period(max_nonemp_str,'Q'))
        #print(period)
        group = onion_df.loc[(onion_df['org'] == 'Non-Employees') &
                    (onion_df['uuid'] == uuid) &
                    (onion_df['Quarter'] == period)]['onion']
        if len(group) > 0:
            hired_onion_df.loc[len(hired_onion_df)] = [uuid, period, 'Non-Employees', group.values[0]]
            #print('2.', group.values[0])
        else:
            hired_onion_df.loc[len(hired_onion_df)] = [uuid, period, 'Non-Employees', 'none-3']
            print('3.', period, uuid)

print('not found:', not_found)
print('employee_commit_first:', employee_commit_first)

hired_onion_df


Read UUIDs: 1076
100
200
300
1. 1997Q4 322e51194cdf78c23d517f3548ede5c5c0e972c3
1. 1997Q4 05dd034c58c6940a84f28c9377700c0083e1d7f7
400
500
1. 1997Q4 29475803048bdfd286019a910dc1e21d983dcaed
600
700
800
900
1. 1997Q4 0e0b6cd5c9c108c40b9f2605969054dbdb29c8e0
1000
not found: 690
employee_commit_first: 92


Unnamed: 0,uuid,time,org,onion
0,0134e2ebeec7f2f1acf7c635b5b22e2c441bd3ec,2009Q4,Non-Employees,casual
1,491b27f1c6560330725414d32fc5090b5c40e02a,2015Q3,Non-Employees,core
2,14e3bbbe3cca5f6861f737634f340aa034f1182a,2007Q1,Non-Employees,core
3,91dfc9eec717efdf81affc16fedcea176ac75df5,2012Q1,Non-Employees,core
4,00834d313bfc6fc60be1631bcc57b2c05ee2e0e3,2010Q1,Non-Employees,regular
5,aaf0f869ca958068c4b170ef99093297c005d53b,2014Q1,Non-Employees,casual
6,034873622c2212a4ed33704d19a082647ced26b2,2012Q4,Non-Employees,casual
7,4bd11a9c490003972d8447a7cbda3bc39ae479b6,2013Q1,Non-Employees,regular
8,6fe046b09296975a2e1cc84d63abacf42c6d31f6,2012Q4,Non-Employees,regular
9,111e755f5e636487882921221c9812acc877850c,2012Q1,Non-Employees,casual


In [73]:
grouped_hired_df = hired_onion_df.groupby(['onion']).agg({'uuid': 'count'})
grouped_hired_df


Unnamed: 0_level_0,uuid
onion,Unnamed: 1_level_1
casual,152
core,68
none-1,4
regular,70


In [74]:
hired_2016_df = hired_onion_df.loc[hired_onion_df['time'].isin(['2016Q1', '2016Q2' , '2016Q3', '2016Q4'])]

grouped_hired_df = hired_2016_df.groupby(['onion']).agg({'uuid': 'count'})
grouped_hired_df

Unnamed: 0_level_0,uuid
onion,Unnamed: 1_level_1
casual,18
core,9
regular,5


In [75]:
hired_2015_df = hired_onion_df.loc[hired_onion_df['time'].isin(['2015Q1', '2015Q2' , '2015Q3', '2015Q4'])]

grouped_hired_df = hired_2015_df.groupby(['onion']).agg({'uuid': 'count'})
grouped_hired_df

Unnamed: 0_level_0,uuid
onion,Unnamed: 1_level_1
casual,20
core,7
regular,9


In [76]:
hired_2012_df = hired_onion_df.loc[hired_onion_df['time'].isin(['2012Q1', '2012Q2' , '2012Q3', '2012Q4'])]

grouped_hired_df = hired_2012_df.groupby(['onion']).agg({'uuid': 'count'})
grouped_hired_df

Unnamed: 0_level_0,uuid
onion,Unnamed: 1_level_1
casual,18
core,9
regular,3


In [77]:
hired_2010_df = hired_onion_df.loc[hired_onion_df['time'].isin(['2010Q1', '2010Q2' , '2010Q3', '2010Q4'])]

grouped_hired_df = hired_2010_df.groupby(['onion']).agg({'uuid': 'count'})
grouped_hired_df

Unnamed: 0_level_0,uuid
onion,Unnamed: 1_level_1
casual,10
core,7
regular,14


In [78]:
hired_2009_df = hired_onion_df.loc[hired_onion_df['time'].isin(['2009Q1', '2009Q2' , '2009Q3', '2009Q4'])]

grouped_hired_df = hired_2009_df.groupby(['onion']).agg({'uuid': 'count'})
grouped_hired_df

Unnamed: 0_level_0,uuid
onion,Unnamed: 1_level_1
casual,12
core,5
regular,3


In [79]:
hired_2008_df = hired_onion_df.loc[hired_onion_df['time'].isin(['2008Q1', '2008Q2' , '2008Q3', '2008Q4'])]

grouped_hired_df = hired_2008_df.groupby(['onion']).agg({'uuid': 'count'})
grouped_hired_df

Unnamed: 0_level_0,uuid
onion,Unnamed: 1_level_1
casual,9
core,9
regular,3


In [80]:
hired_2007_df = hired_onion_df.loc[hired_onion_df['time'].isin(['2007Q1', '2007Q2' , '2007Q3', '2007Q4'])]

grouped_hired_df = hired_2007_df.groupby(['onion']).agg({'uuid': 'count'})
grouped_hired_df

Unnamed: 0_level_0,uuid
onion,Unnamed: 1_level_1
core,1
regular,1


In [56]:
### LOOK FOR THOSE PEOPLE WHO COMMITED TO FIREFOX OS. DO THEY COMMIT TO OTHER PROJECTS LATER ON?

group = onion_df.loc[onion_df['uuid'] == '9248ea74bea50a9f63f9adca58a9283318c6c4aa']
group

Unnamed: 0,uuid,time,org,onion,Quarter
527,9248ea74bea50a9f63f9adca58a9283318c6c4aa,2012-01-01T00:00:00.000Z,Employees,casual,2012Q1
689,9248ea74bea50a9f63f9adca58a9283318c6c4aa,2012-01-01T00:00:00.000Z,Non-Employees,casual,2012Q1
485,9248ea74bea50a9f63f9adca58a9283318c6c4aa,2012-04-01T00:00:00.000Z,Employees,casual,2012Q2
384,9248ea74bea50a9f63f9adca58a9283318c6c4aa,2012-07-01T00:00:00.000Z,Employees,regular,2012Q3
433,9248ea74bea50a9f63f9adca58a9283318c6c4aa,2012-10-01T00:00:00.000Z,Employees,regular,2012Q4
496,9248ea74bea50a9f63f9adca58a9283318c6c4aa,2013-01-01T00:00:00.000Z,Employees,regular,2013Q1
893,9248ea74bea50a9f63f9adca58a9283318c6c4aa,2014-01-01T00:00:00.000Z,Employees,casual,2014Q1
1009,9248ea74bea50a9f63f9adca58a9283318c6c4aa,2014-04-01T00:00:00.000Z,Employees,casual,2014Q2
1001,9248ea74bea50a9f63f9adca58a9283318c6c4aa,2014-10-01T00:00:00.000Z,Employees,casual,2014Q4
1196,9248ea74bea50a9f63f9adca58a9283318c6c4aa,2015-01-01T00:00:00.000Z,Employees,casual,2015Q1


In [71]:
group = authors_df.loc[authors_df['uuid'] == '9248ea74bea50a9f63f9adca58a9283318c6c4aa']
group

Unnamed: 0,time,uuid,org,commits
14583,2011-10-01T00:00:00.000Z,9248ea74bea50a9f63f9adca58a9283318c6c4aa,Non-Employees,57.0
15284,2012-01-01T00:00:00.000Z,9248ea74bea50a9f63f9adca58a9283318c6c4aa,Non-Employees,138.0
15285,2012-01-01T00:00:00.000Z,9248ea74bea50a9f63f9adca58a9283318c6c4aa,Employees,103.0
16288,2012-04-01T00:00:00.000Z,9248ea74bea50a9f63f9adca58a9283318c6c4aa,Employees,145.0
17323,2012-07-01T00:00:00.000Z,9248ea74bea50a9f63f9adca58a9283318c6c4aa,Employees,172.0
18486,2012-10-01T00:00:00.000Z,9248ea74bea50a9f63f9adca58a9283318c6c4aa,Employees,74.0
19582,2013-01-01T00:00:00.000Z,9248ea74bea50a9f63f9adca58a9283318c6c4aa,Employees,72.0
20918,2013-04-01T00:00:00.000Z,9248ea74bea50a9f63f9adca58a9283318c6c4aa,Employees,55.0
22586,2013-07-01T00:00:00.000Z,9248ea74bea50a9f63f9adca58a9283318c6c4aa,Employees,23.0
23757,2013-10-01T00:00:00.000Z,9248ea74bea50a9f63f9adca58a9283318c6c4aa,Employees,44.0
