Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facilitate UAST-based tasks on PGA #18

Closed
4 tasks
smola opened this issue Mar 27, 2018 · 10 comments
Closed
4 tasks

Facilitate UAST-based tasks on PGA #18

smola opened this issue Mar 27, 2018 · 10 comments
Labels
Discarded P0 high priority T:Data Retrieval Data Retrieval objectives T:Engineering Engineering objectives

Comments

@smola
Copy link
Contributor

smola commented Mar 27, 2018

Parent objective: #11
Progress: 0%

  • [P0] Create UAST dataset based on PGA consumable by ML team
  • [P0] Define a format that can store UAST in a space-efficient way
  • [P1] Publish UAST dataset
  • [P2] Publish an index for the UAST dataset that DevRel can use to improve download criteria for it
@smola smola added the P0 high priority label Mar 27, 2018
@smola smola added the T:Engineering Engineering objectives label Mar 27, 2018
@dennwc
Copy link

dennwc commented Mar 27, 2018

@smola I think we should add an objective for UAST compression. I have few options in mind to cut the size of the dataset.

@eiso
Copy link
Member

eiso commented Mar 27, 2018

@dennwc I am about to invite the @src-d/language-analysis team to the slack channel to start creating their OKR's. This would be a great example of something you can edit/add. OKR's are designed to also be bottom-up, not just top-down in the company.

@bzz
Copy link
Contributor

bzz commented Mar 28, 2018

Just something that we already discussed internal and externally and might be nice to keep in mind - people are excited about ability to query UAST dataset on Google's BigQuery.

@smola smola added the T:Data Retrieval Data Retrieval objectives label Apr 2, 2018
@campoy
Copy link
Contributor

campoy commented Apr 3, 2018

@smola, could you explain what you mean by "an index for the UAST dataset"?

Also, I'm wondering whether we could make it so people need to somehow register to download PGA
Do we keep track of the number of downloads already?

@marnovo
Copy link
Member

marnovo commented Apr 5, 2018

[OKRs Review] #25 removed from parent because too detailed.

@marnovo
Copy link
Member

marnovo commented Apr 5, 2018

[OKRs Review] Problem/need is not well defined yet to prompt the work for this as an objective. Discarding.

@smola
Copy link
Contributor Author

smola commented Apr 5, 2018

@campoy An index you can use to build a selective download tool for different chunks of the dataset. Similar to what we have for PGA.

@campoy
Copy link
Contributor

campoy commented Apr 5, 2018

So basically you can download the UASTs rather than the siva files, @smola?
Cool!

@smola
Copy link
Contributor Author

smola commented Apr 6, 2018

@campoy Right. That would be the idea. Note, however, that this is discarded as an engineering objective for this quarter. We'll discuss the viability of this later on though.

@eiso eiso removed the Discarded label Apr 6, 2018
@eiso
Copy link
Member

eiso commented Apr 6, 2018

@marnovo instead of using discarded, I am closing the issue instead.

@eiso eiso closed this as completed Apr 6, 2018
@eiso eiso added the Discarded label Apr 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discarded P0 high priority T:Data Retrieval Data Retrieval objectives T:Engineering Engineering objectives
Projects
None yet
Development

No branches or pull requests

6 participants