Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create script to check project files for missing data #386

Closed
2 of 4 tasks
cnk opened this issue Mar 22, 2020 · 42 comments
Closed
2 of 4 tasks

Create script to check project files for missing data #386

cnk opened this issue Mar 22, 2020 · 42 comments
Assignees
Labels
APBMDA Move issue to project board, Move to Done, Archive P-Feature: Project Info and Page A project's detail page (e.g. https://www.hackforla.org/projects/100-automations) Research Tasks for researchers role: back end/devOps Tasks for back-end developers role: product Product Management size: missing To Update ! No update has been provided

Comments

@cnk
Copy link
Member

cnk commented Mar 22, 2020

Overview

We need an easy way of finding out when data is missing from project cards so that we can present an uniform UX and so that functions that rely on this data can work across all projects.

Action Items

  • Determine desired output and process to follow
  • Identify data types (links, leadership, description, etc)
  • Identify which items projects might not have (e.g. live site)
  • Propose value to use for fields where data is intentionally missing

Requirements

  • Frequency: 1x/week
  • Output: ?
@zempo zempo self-assigned this Mar 30, 2020
@zempo zempo added the Research Tasks for researchers label Mar 30, 2020
@zempo
Copy link
Member

zempo commented Mar 30, 2020

I will make a test repo with mock data to try some solutions
It will expand upon @cnk 's previous attempt to crawl through the project's directory.

  • research options
  • make test repo
  • check with Bonnie

@zempo zempo added the enhancement New feature or request suggestion label Mar 30, 2020
@alexandrastubbs alexandrastubbs added this to Ice box in Project Board via automation Jul 26, 2020
@alexandrastubbs alexandrastubbs moved this from Ice box to Prioritized backlog in Project Board Jul 26, 2020
@alexandrastubbs alexandrastubbs added the role: product Product Management label Jul 26, 2020
@ExperimentsInHonesty ExperimentsInHonesty moved this from Prioritized backlog to In progress in Project Board Jul 29, 2020
@rblaz001
Copy link
Member

rblaz001 commented Aug 1, 2020

I'm just getting over some pretty bad food poisoning so I haven't worked on this as much as I would have liked. I still need to make a 100 Automation's repo to continue developing my python script.

Action Item Progress

  • Determine desired output and process to follow
  • Identify data types (links, leadership, description, etc)
  • Identify which items projects might not have (e.g. live site)
  • Propose value to use for fields where data is intentionally missing

I have chosen to stick to a spreadsheet for the desired output of our script. I'm currently outputting a spreadsheet that contains two columns, one for the name of the project and the other for the list of missing data types.

The data types that we are using are currently hard coded in the script and were obtained by using a script that collected a set of used data types across all projects.

Set of Data Types
Screenshot from 2020-08-01 14-23-59

Project Missing Data Types
Screenshot from 2020-08-01 14-23-26

Note: I'm currently not taking account of nested data types. For example name, role, and links that are under "Leadership" in the following image. This is something I want to look in to, but I might have to reconsider how I deal with my output and determine what nested data types are important.

Screenshot from 2020-08-01 14-43-56

@rblaz001
Copy link
Member

rblaz001 commented Aug 9, 2020

I now have a version of the script that outputs unused data types and missing data types for all projects.
I still need create a 100 automation repo and add my work to it.
Remaining work is quality of life edits to the script, including proper command line support.

The data types that we are using are currently hard coded in the script and were obtained by using a script that collected a set of used data types across all projects. The nested data types are denoted by comma delimited entries.

Set of Data Types
Screenshot from 2020-08-09 09-56-14

Project Unused Data Types
Screenshot from 2020-08-09 09-55-02

Project Missing Data Types (Currently Only One Missing)
Screenshot from 2020-08-09 09-55-21

  • need to change output to properly where the missing data is nested.

Example of role missing from above entry
Screenshot from 2020-08-09 09-58-00

@rblaz001
Copy link
Member

@alexandrastubbs
Copy link
Member

@ExperimentsInHonesty will talk to @rblaz001 about topic tags

@ExperimentsInHonesty
Copy link
Member

ExperimentsInHonesty commented Aug 16, 2020

@rblaz001 Please fill this out, so that we can get you listed on the 100automations.org site that will soon exist:
create a project card for a 100 automations project

@rblaz001
Copy link
Member

Spreadsheet for current unused/missing data types.

Note: Unused Data Types are data types that are never used in a project card. Missing Data Types are nested data types that are missing even though the parent data types are present. (This can be one or more instances)

missingDataTypes.xlsx

@rblaz001
Copy link
Member

rblaz001 commented Aug 16, 2020

Notes From Bonnie:

  • Research Github Action integration for creating issues
  • Add new validation method. Provide a source .md collection file with every data type in project with a require value or ignore value. This will be used to determine how script parses missing data types.
  • Add functionality to output report of missing data types of source .md file. This will make sure that source .md file is kept up to date with requirements.
  • Add additional output option as JSON
  • Add Github Action integration

@alexandrastubbs alexandrastubbs added the role: back end/devOps Tasks for back-end developers label Aug 16, 2020
@alexandrastubbs
Copy link
Member

@rblaz001 do you have a list of all the data types? If so, can you upload to google drive and share the link here?

@rblaz001
Copy link
Member

@alexandrastubbs I have to make a small adjustment to the script to output all the currently used data types. I'm not home at the moment but I'll be able to upload the file and link it before tomorrow morning.

@rblaz001
Copy link
Member

@alexandrastubbs I added an extra sheet labeled "Used Data Types" that has a list of data types used across the project cards.

https://docs.google.com/spreadsheets/d/1w59UfwbGiEAoBXjAxl4uwQe3xk5j6yFd3NxBMOm510Y/edit?usp=sharing

Note: The data types are not ordered. Also comma separated data types represent the nesting
For example One,Two,Three would look like the following in the project card

One:
    Two:
        Three: value

@rblaz001
Copy link
Member

rblaz001 commented Aug 23, 2020

Progress Report
Progress - Was able to finish refactor, add JSON support, and update README. I was also able to update repo name and logo to include Jekyll
Blockers - Need to show progress to PM and get notes
Availability: 3-4 days depending on possible job interviews
ETA:

  • Automation Card: End of week at most but shooting for much sooner
  • Validate Through Template Feature: 7-9 days, difficulty level is relatively low but I'm catching up on documentation from refactor plus food-oasis tasks.
  • GitHub Action Issues: TBD, began research

Work In Progress
Progress with automation card:
100Automations/Website#29

Finishing refactor documentation:
100Automations/jekyll-gather-data-types#2

Then working on validate through template feature:
100Automations/jekyll-gather-data-types#3

Was also able to look in to github action api and it should be possible to automate issues for missing data types relatively easy. The only blocker I'm having right now is thinking through how to manage issues so redundant issues are not created.

@alexandrastubbs
Copy link
Member

@rblaz001 to rename from 'used data types' to 'all data types'

@rblaz001
Copy link
Member

rblaz001 commented Aug 30, 2020

Progress - My availability changed this last week, I was unable to get any work done
Blockers - None
Availability: No more interviews or scheduling conflicts, will be available all week

Priority:

  • Finish automation card
  • Rename 'used data types' to 'all data types'
  • Add validate through template feature

Task after priority finished:

  • GitHub actions for automatic issue creation

@ExperimentsInHonesty
Copy link
Member

@rblaz001 checking to see if you are able to put in an update on this issue before tomorrow.

@rblaz001
Copy link
Member

rblaz001 commented Sep 12, 2020

Progress

  • Finished automation card, renamed 'used data types' to 'all data types ' in output, and finished updating documentation.

Next step is to add validate through template.

  • I will work on this tonight and either should finish first draft of it tonight, or will update with expected completion time.
  • Finished MVP for validate through template, still need to update documentation and README before merging to master.

Then begin work on implementing GitHub actions

Blockers - Moving to Austin, still trying to find an apartment so I have very limited time

Availability - Limited

@ExperimentsInHonesty
Copy link
Member

Do UX requirement gathering.
Bonnie will post on CfA slack asking for other people who have Jekyll sites and use collections.

@rblaz001
Copy link
Member

rblaz001 commented Sep 13, 2020

template.md

---
alt: important
alt-hero: important
completed-contact: important
description: important
hide: ignore
identification: important
image: important
image-hero: important
leadership:
  - links:
      github: important
      linkedin: important
      slack: important
    name: important
    picture: important
    role: important
links:
  - name: important
    url: important
location: important
looking:
  - category: important
    skill: important
partner: important
status: important
technologies: important
title: important
tools: important
---

@rblaz001
Copy link
Member

rblaz001 commented Sep 13, 2020

@cnk
Copy link
Member Author

cnk commented Mar 4, 2021

What are the items like:

{
    "filename": "food-oasis.md",
    "errors": [
        "is not of a type(s) string",
        "is not of a type(s) array"
    ]
}

@akibrhast
Copy link
Member

akibrhast commented Mar 4, 2021

ah.. my bad... let me work on a bit more detailed error report.
But in answer to your question. The two errors in question are happening because these fields['looking','role'] in the image below are empt/null.
Where my json schema is expecting them to be a string. @cnk

image
image

@akibrhast
Copy link
Member

Edited #386 (comment) , to give a bit more detailed report.
I was really hoping to get something along the line of

"error":[
 {
 "message": "is not of type string",
 "expected": "string",
 "given": "null",
 "stack": "on line 256 ,in 'message': null"
 }
]

but can't seem to find the relevant information... :(

@cnk
Copy link
Member Author

cnk commented Mar 5, 2021

"looking" (and probably "technologies") can be empty. Can you express that in a json schema? optional fields can exist but be empty?

@akibrhast
Copy link
Member

yewp, not a problem. I can make the looking be either string/null . But a project should always at least have technology, no?? I would be hesitant on allowing that to be empty...

@akibrhast
Copy link
Member

Updated - > #386 (comment)
An empty looking field does not report an error. Although it is still a required key.

@akibrhast akibrhast self-assigned this Mar 15, 2021
@akibrhast akibrhast added the Status: Help Wanted Internal assistance is required to make progress label Mar 15, 2021
@akibrhast
Copy link
Member

akibrhast commented Mar 15, 2021

Currently I have a on: workflow dispatch action running that checks the files and generates a validation report. @ruben1s

Require feedback on where you want to go next with this @cnk? @ExperimentsInHonesty ?

My Proposal

  • Have this github action run ever 3.5days
  • On action start
    1. Check and see if an issue created by the action already exists
    2. If an open issue exists that was created by this bot-action
      • Quit Action

This should be fairly easy to implement using this endpoint

 https://api.github.com/repos/hackforla/website/issues?state=open&creator=akibrhast

  1. If an open issue does not exist that was created by this bot-action
    • Generate validation report and create an issue with the report data

Example Issue that would be created :

Project Validation Failing[month/date/year]

Overview

The following report was generated due to project schema not matching validation schema. Please review the report listed below under Project Validation Error Report and apply the necessary fixes.

Project Validation Error Report
[
    {
        "file": "100 Automations",
        "errors": [
            {
                "message": "requires property \"partner\"",
                "stack": "instance requires property \"partner\""
            },
            {
                "message": "requires property \"tools\"",
                "stack": "instance requires property \"tools\""
            }
        ]
    },
    {
        "file": "311 Data",
        "errors": [
            {
                "message": "requires property \"tools\"",
                "stack": "instance requires property \"tools\""
            }
        ]
    }
]

Action Items

  • Please apply the fixes to the files listed in the resources below using the error report above

Resources/Instructions

@ExperimentsInHonesty
Copy link
Member

@akibrhast I like the proposal.

  • Still trying to figure out how this would be used.
    • concerned that if it has one issue that has a lot of different projects in it, it will be hard to parse out. We can't get 10 PMs to weigh in on one issue.
    • concerned about us determining that a project was not going to have something in it (like no partner) and no way to get it to stop flagging it.

Interested to hear your suggestions for improvement based on these concerns.

@akibrhast
Copy link
Member

akibrhast commented Mar 16, 2021

concerned about us determining that a project was not going to have something in it (like no partner) and no way to get it to stop flagging it.

Currently the schema is created based on this comment #386 (comment) ('Current Project Must Have's') .

According to that comment, all current projects must have those keys. However it seems like if you look at the image below that key is commented out for some reason. https://raw.githubusercontent.com/hackforla/website/gh-pages/_projects/100-automations.md . That should not be the case, it's fine if the partner value is empty, but key should still exist.

image

According to the original issue Action Item #386 (comment)

  • Identify which items projects might not have (e.g. live site)

Some details that needs to be consolidated are

  • what are the keys that a current project MUST ALWAYS have
  • what are the values that a current project MUST ALWAYS have (example: title,id)
  • what are the values that current project may be OPTIONAL(ex: partners)
  • what are the type of the value a field accepts for example a slack link should always look like https://hackforla.slack.com/team/xxxxxx .

Same questions apply to Completed Projects and On Hold Projects.

Changing the schema is relatively simple. As you can see, a comment by @cnk #386 (comment) was made and the schema was soon changed after that.


concerned that if it has one issue that has a lot of different projects in it, it will be hard to parse out. We can't get 10 PMs to weigh in on one issue.

Maybe generate a single issue per file ?

@akibrhast
Copy link
Member

Progress

None since 5 days ago
#386 (comment)

Blockers

Awaiting Feedback from @ExperimentsInHonesty @cnk

Availability

Here and There

ETA

N/A

@akibrhast akibrhast added Status: Updated No blockers and update is ready for review Status: Help Wanted Internal assistance is required to make progress and removed Status: Help Wanted Internal assistance is required to make progress labels Mar 21, 2021
@akibrhast akibrhast moved this from Prioritized backlog to Links / Questions / In Review in Project Board Mar 21, 2021
@ExperimentsInHonesty
Copy link
Member

Met with @akibrhast - he will add notes to this doc.

@ExperimentsInHonesty ExperimentsInHonesty removed Status: Updated No blockers and update is ready for review Status: Help Wanted Internal assistance is required to make progress labels Mar 24, 2021
@ExperimentsInHonesty ExperimentsInHonesty moved this from Links / Questions / In Review to In progress in Project Board Mar 26, 2021
@qiqicodes qiqicodes added the To Update ! No update has been provided label May 9, 2021
@akibrhast akibrhast moved this from In progress to Ice box in Project Board May 26, 2021
Project Board automation moved this from Ice box to Done Jun 12, 2021
@ExperimentsInHonesty ExperimentsInHonesty removed this from Done in Project Board May 8, 2022
@ExperimentsInHonesty ExperimentsInHonesty added the APBMDA Move issue to project board, Move to Done, Archive label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APBMDA Move issue to project board, Move to Done, Archive P-Feature: Project Info and Page A project's detail page (e.g. https://www.hackforla.org/projects/100-automations) Research Tasks for researchers role: back end/devOps Tasks for back-end developers role: product Product Management size: missing To Update ! No update has been provided
Development

No branches or pull requests

10 participants