Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis: Collecting Data on Issue Completion per Prework Author and Creating Looker Dashboards to Uncover Insights #4152

Open
22 of 28 tasks
kimberlytanyh opened this issue Mar 12, 2023 · 16 comments
Labels
Complexity: Small Take this type of issues after the successful merge of your second good first issue Dependency An issue is blocking the completion or starting of another issue Feature: Board/GitHub Maintenance Project board maintenance that we have to do repeatedly feature: ladder progress dashboard role: data analyst size: 2pt Can be done in 7-12 hours Status: Updated No blockers and update is ready for review

Comments

@kimberlytanyh
Copy link
Member

kimberlytanyh commented Mar 12, 2023

Dependency

Overview

We need to collect data on the authors of all the prework issues in our repository to perform data analysis.

Action Items

  • Find URL for GitHub REST API documentation and add it to the resources below

  • Read relevant sections in GitHub API documentation on retrieving data with REST APIs

  • Search for other resources on platforms or libraries and syntax to use to retrieve data with GitHub REST APIs

  • Download Postman to retrieve needed JSON data via GitHub REST API (based on online tutorials)

  • Read documentation on rate limiting

  • Retrieve data on all prework issues (date range from Nov 1, 2021 to now) using REST API in Jupyter Notebook

  • Put JSON data in a tabular format and clean data

  • Get distribution of issues completed by each complexity level for each prework author:
    Put data in columns: GitHub Handle, Date Prework Closed, No. of Good First Issues Completed, No. of Good Second Issues Completed, No. of Small Complexity Issues Completed, No. of Medium Complexity Issues Completed, No. of Large Complexity Issues Completed

  • Export data as Excel file and add to Google Drive folder (GitHub Data Analysis)

  • Manually check accuracy of numbers in dataset/spreadsheet)

  • Write documentation on process and considerations (not complete yet)

  • Duplicate data in another spreadsheet and perform following analysis:

  1. Of the 202 people, how many people left the team?
  2. How many people started and got to "Complexity: Large" issues (completed at least 2 for combined first and second good issue, and one of every other complexity type)?
  • Perform above analysis again on only closed prework issues.

  • Clean data and get number and percentage of closed large issues that were unassigned in Google Sheets

  • Create Google spreadsheet with list of issues that have more than one complexity label and unassigned closed large issues.

  • Perform cohort analysis on closed prework authors

  1. Clean data from API and create dataset
  2. Import into Google Drive and visualize data in Google Sheets
  • Research how to connect data to Looker Studio in a way that new data can come in and Looker visualizations are automatically updated.

  • Create new repository with Sophia and Chelsey's help that has GitHub Actions that perform cron job so that Python script can be run automatically daily for fresh data.

  • Add automation components to Python script and verify data cleaning accuracy.

  • Create Looker dashboard with data pulled in.

  • Refine the Looker dashboard so that it is more intuitive

  • Investigate correlation between number of issues available and cohort performance:

    • Design analysis and investigate where/how data can be obtained

Might be separated into another issue

  • Get project board column data from GitHub and clean the data
  • Set up data source and create Looker dashboard to show live number of issues available per role
  • Create separate dashboard pages for developers (front end, back end, front and back end, and dev lead)
  • Create documentation of process for GitHub class using Hack for LA template
  • Set up automation of running of Python script so that dashbboard updates automatically

Resources/Instructions

  1. GitHub API Documentation
  2. GitHub Rate Limiting
  3. Link to GitHub Data Analysis Folder
  4. Spreadsheet with accurate numbers as of 03/26/2023
  5. Link to process documentation
  6. Using Google Sheets API to add and refresh dataframe in Python to Google Sheets:
    https://www.youtube.com/watch?v=sVURhxyc6jE
    https://medium.com/@jb.ranchana/write-and-append-dataframes-to-google-sheets-in-python-f62479460cf0
    https://www.youtube.com/watch?v=3wC-SCdJK2c
  7. Slides documentation process from Python to GitHub
@github-actions github-actions bot added Feature Missing This label means that the issue needs to be linked to a precise feature label. role missing labels Mar 12, 2023
@kimberlytanyh kimberlytanyh added this to New Issue Approval in Project Board via automation Mar 12, 2023
@kimberlytanyh kimberlytanyh added role: data analyst Complexity: Small Take this type of issues after the successful merge of your second good first issue Feature: Board/GitHub Maintenance Project board maintenance that we have to do repeatedly size: 2pt Can be done in 7-12 hours Draft Issue is still in the process of being created and removed role missing Feature Missing This label means that the issue needs to be linked to a precise feature label. labels Mar 12, 2023
@kimberlytanyh
Copy link
Member Author

After finish drafting this issue, add the label "Ready for Product".

@kimberlytanyh kimberlytanyh added ready for product and removed Draft Issue is still in the process of being created labels Mar 12, 2023
@kimberlytanyh kimberlytanyh changed the title Prework Analysis Prework Analysis: Collecting Data on Issue Completion per Prework Author Mar 12, 2023
@ExperimentsInHonesty
Copy link
Member

@kimberlytanyh Add a step to add data to a google sheet on the Team Google Drive. Add a link to the folder it will go in, under the resources section.

@ExperimentsInHonesty ExperimentsInHonesty moved this from New Issue Approval to In progress (actively working) in Project Board Mar 16, 2023
@github-actions github-actions bot added the Status: Updated No blockers and update is ready for review label Mar 17, 2023
@kimberlytanyh
Copy link
Member Author

kimberlytanyh commented Mar 19, 2023

Weekly Update:

  1. Progress: Retrieved data and calculated count of issues per complexity label. Left with converting rows to columns so that we can see the distribution in one row per assignee, and completing documentation.
  2. Blockers: None
  3. Availability: Mon - Fri, 12:00-5:00PM
  4. ETA: ~21 hours. 1-3 hours for remaining deliverables.

@github-actions github-actions bot removed the Status: Updated No blockers and update is ready for review label Mar 24, 2023
@kimberlytanyh
Copy link
Member Author

Weekly Update:

  1. Progress: Adjusted data cleaning method and calculated count of issues per complexity label. Exported dataset as csv and uploaded to the drive. Manually checked accuracy of data. Working on data analysis now.
  2. Blockers: None
  3. Availability: Thurs-Saturday, Anytime
  4. ETA: ~17 hours

@ExperimentsInHonesty
Copy link
Member

@kimberlytanyh we are in the process of changing the labels on issues currently labeled Complexity: Good second issue to good first issue

Why?

  • to improve current and future data analysis
  • to make it easier for devs to know how many issues we need to make at a given time
  • to make it easier for devs who are looking for their next issue to find one

What you need to know

@kimberlytanyh
Copy link
Member Author

@ExperimentsInHonesty Thank you for the heads up! I will adjust my code for the next round of analysis rerun accordingly.

@github-actions github-actions bot added the To Update ! No update has been provided label Apr 14, 2023
@kimberlytanyh
Copy link
Member Author

Weekly Update:

Progress: Identified means for identifying pull requests in retrieved issues through GitHub API. Will re-perform all analyses done and try to improve accuracy of datasets.
Blockers: None
Availability: Saturday
ETA: ~6 hours

@kimberlytanyh
Copy link
Member Author

Weekly Update:

Progress: Looked into data pipeline options to automate data updates for visualizations in Looker.
Blockers: Discussing preferred approach
Availability: Friday-Sunday this week
ETA: 6-10 hours

@kimberlytanyh kimberlytanyh removed the 2 weeks inactive An issue that has not been updated by an assignee for two weeks label May 7, 2023
@kimberlytanyh
Copy link
Member Author

Weekly Update:

Progress: Working on Streamlining Data Cleaning Code in Jupyter Notebook and adding in automation components. Going to try using Google Sheet API to create data source for Looker Dashboard.
Blockers: Automating and scheduling notebook to run automatically. Deciding on best data source for Looker (in the midst of scheduling a working session with Chelsey, Karina, and Sophie).

Availability: Mon, Friday-Sunday next week
ETA: 6-10 hours

@kimberlytanyh
Copy link
Member Author

kimberlytanyh commented May 19, 2023

Weekly Update:

Progress: Created repository to establish automation of running Python Data Cleaning script using GitHub Actions with Sophia, Chelsey, and Karina. Next step is to clean up existing code for automation and data accuracy, try Google Sheets API and establishing data source for Looker.

Concepts/ tools used for setting up daily running of Python code cleaning script automatically (in case want to set up wiki in the future):

  • Set up a new GitHub repository
  • Clone repository and upload .py version of Python script by committing it in Visual Studio Code
  • Under workflows, create a .yaml file and write a script for cron job (instructs how often to run the script automatically via GitHub Actions)
  • Remember to go to "Settings" in GitHub repo to set up environment variable containing GitHub variable so that no one else can access the token string. Refer to it in the .yaml file and use the os library package to retrieve the environment variable containing the GitHub token in the .py file.
  • Go to GitHub Workflow to see if the .py file or script runs successfully.

Blockers:

Availability: Weekend and Mon-Fri next week, 12PM -7PM
ETA: 15+ hours

@github-actions github-actions bot added the To Update ! No update has been provided label Jun 2, 2023
@github-actions
Copy link

github-actions bot commented Jun 2, 2023

@kimberlytanyh

Please add update using the below template (even if you have a pull request). Afterwards, remove the 'To Update !' label and add the 'Status: Updated' label.

  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (optional): "Add any pictures of the visual changes made to the site so far."

If you need help, be sure to either: 1) place your issue in the developer meeting discussion column and ask for help at your next meeting, 2) put a "Status: Help Wanted" label on your issue and pull request, or 3) put up a request for assistance on the #hfla-site channel. Please note that including your questions in the issue comments- along with screenshots, if applicable- will help us to help you. Here and here are examples of well-formed questions.

You are receiving this comment because your last comment was before Tuesday, May 30, 2023 at 12:15 AM PST.

@github-actions github-actions bot removed the To Update ! No update has been provided label Jun 9, 2023
@github-actions
Copy link

github-actions bot commented Jun 9, 2023

@kimberlytanyh

Please add update using the below template (even if you have a pull request). Afterwards, remove the '2 weeks inactive' label and add the 'Status: Updated' label.

  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (optional): "Add any pictures of the visual changes made to the site so far."

If you need help, be sure to either: 1) place your issue in the developer meeting discussion column and ask for help at your next meeting, 2) put a "Status: Help Wanted" label on your issue and pull request, or 3) put up a request for assistance on the #hfla-site channel. Please note that including your questions in the issue comments- along with screenshots, if applicable- will help us to help you. Here and here are examples of well-formed questions.

You are receiving this comment because your last comment was before Tuesday, June 6, 2023 at 12:16 AM PST.

@github-actions github-actions bot added the 2 weeks inactive An issue that has not been updated by an assignee for two weeks label Jun 9, 2023
@kimberlytanyh
Copy link
Member Author

kimberlytanyh commented Jun 10, 2023

Progress: In the process of changing one more section of the code for automation and double checking accuracy of data after cleaning (need to improve accuracy of crediting the right amount of small issues for agenda issues that have multiple assignees). Next step is to add the Python script for automation and clean and create dataset for the live dashboard on number of issues available.

Blockers: None yet.
Availability: 6-8 hours
ETA: A few more weeks since it is an evolving and ongoing issue.

@kimberlytanyh kimberlytanyh added Status: Updated No blockers and update is ready for review and removed 2 weeks inactive An issue that has not been updated by an assignee for two weeks labels Jun 10, 2023
@github-actions github-actions bot removed the Status: Updated No blockers and update is ready for review label Jun 16, 2023
@github-actions github-actions bot added To Update ! No update has been provided 2 weeks inactive An issue that has not been updated by an assignee for two weeks and removed To Update ! No update has been provided labels Jun 23, 2023
@github-actions
Copy link

@kimberlytanyh

Please add update using the below template (even if you have a pull request). Afterwards, remove the '2 weeks inactive' label and add the 'Status: Updated' label.

  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (optional): "Add any pictures of the visual changes made to the site so far."

If you need help, be sure to either: 1) place your issue in the developer meeting discussion column and ask for help at your next meeting, 2) put a "Status: Help Wanted" label on your issue and pull request, or 3) put up a request for assistance on the #hfla-site channel. Please note that including your questions in the issue comments- along with screenshots, if applicable- will help us to help you. Here and here are examples of well-formed questions.

You are receiving this comment because your last comment was before Tuesday, June 27, 2023 at 12:17 AM PST.

@kimberlytanyh
Copy link
Member Author

Progress: Completed documentation of process for live issue availability dashboard (for GitHub class). Left to do: Edit Python script to add in data from other columns, add it to repository for automation, and finish creating dashboard.
Blockers: None yet. Might have to consult Data Science COP about auto running automation script.
Availability: 21 hours next week Mon-Fri.
ETA: By next week or two.

@kimberlytanyh kimberlytanyh added Status: Updated No blockers and update is ready for review and removed 2 weeks inactive An issue that has not been updated by an assignee for two weeks labels Jul 2, 2023
@kimberlytanyh kimberlytanyh changed the title Prework Analysis: Collecting Data on Issue Completion per Prework Author Prework Analysis: Collecting Data on Issue Completion per Prework Author and Creating Looker Dashboards to Uncover Insights Jul 2, 2023
@kimberlytanyh kimberlytanyh changed the title Prework Analysis: Collecting Data on Issue Completion per Prework Author and Creating Looker Dashboards to Uncover Insights Analysis: Collecting Data on Issue Completion per Prework Author and Creating Looker Dashboards to Uncover Insights Jul 2, 2023
@ExperimentsInHonesty ExperimentsInHonesty added the Dependency An issue is blocking the completion or starting of another issue label Jul 2, 2023
@kimberlytanyh kimberlytanyh moved this from In progress (actively working) to Ice box in Project Board Jul 7, 2023
@kimberlytanyh kimberlytanyh removed their assignment Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complexity: Small Take this type of issues after the successful merge of your second good first issue Dependency An issue is blocking the completion or starting of another issue Feature: Board/GitHub Maintenance Project board maintenance that we have to do repeatedly feature: ladder progress dashboard role: data analyst size: 2pt Can be done in 7-12 hours Status: Updated No blockers and update is ready for review
Projects
Project Board
  
Ice box
Development

No branches or pull requests

2 participants