Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sprint planning and enhancements - ticket-ify this stuff and stick it in sprints #60

Closed
nkmeyers opened this issue May 26, 2020 · 2 comments
Assignees
Labels
COVID-19 this issue is top priority because of COVID-19 data This issue is related to data enhancement New feature or request index This issue is related to indexing

Comments

@nkmeyers
Copy link
Collaborator

nkmeyers commented May 26, 2020

@ericleasemorgan writes: I am not ignoring y'all. Interesting discussions, and I encourage them to continue.

Concurrently, we need to: 1) get the whole thing running, and 2) then do enhancements. When it comes to Item 1, we have to:

  1. harvest/cache the data set (done) download CORD-19 datasets as well as the corresponding metadata file; select subset of CORD-19, save the metadata in the database, transform the JSON into plain text, and save the plain text to the file system #13
  2. stuff the result into a database (done) store extracted features into the database system #9 create a database node for enhanced Distant Reader backend, and initialize the database #12
  3. enhance the database with additional content (all but done) - what tickets are related to this? and does any other milestone or issue completion depend on it?
  4. index the database (all but done) create an indexer node, and initialize an index #14 loop through the database to create a full text index of the selected subset #15 loop through the database to create a full text index of all of CORD-19 #22 re-create the Solr index #59
  5. make it easy for Team CORD to create study carrels (half done) - what tickets are related to this? and does any other milestone or issue completion depend on it?
  6. make many carrels (barely started) - what tickets are related to this and does any other milestone or issue completion depend on it?
  7. create a Web presence (almost done) create a node for serving HTTP #16, create a Web interface for searching the results  #17 any others?

STUFF Below here needs to be ticket-ified or aligned with tickets .
Once we get that far, which I anticipate will be by next Friday(May 29?) , we can go for enhancements, and there are many possibilities:

  • add long titles to list of carrels
  • allow people other than Team CORD to create study carrels
  • create a study carrel out of the whole of CORD, which requires scalability
  • create better stop word list
  • enable the whole "library" to be re-created
  • enhance author names with corresponding ORCHIDs
  • enhance Web presence with additional logos and attributions
  • extract additional grammars
  • figure out a way to dynamically create stop word list
  • generate additional measures of the documents
  • hyperlink bibliographic items to full text and other things
  • illustrate relationships using a network diagram
  • improve topic modeling
  • index study carrels
  • make everything FAIR
  • plot results on a map
  • plot results on a time line
  • refine entity output

As we enhance, we will repeatedly go back to Step #6 and re-build study carrels over and over, thus the carrels will be in a state of "continuous improvement".†

The whole thing is like playing guitar. First you need to learn how hold it. Then you need to learn how to tune it. Then you need to learn a few chords. After that you need to learn how to "keep time". Once you get that far, then you can concentrate to bending notes, advance to finger picking, playing syncopation, experiment with alternative tunings, moving the chords up and down the fret board, improvising, playing in various styles, performing, recording, etc.

We are getting there. I assure you. Please continue to discuss all of these things, and once we get the Reader running, we will prioritize enhancements, divvy up the work, and make the whole something we can be proud of.

† I can't believe I actually used that phrase.

--
Eric M.

Originally posted by @ericleasemorgan in #58 (comment)

@nkmeyers nkmeyers added data This issue is related to data enhancement New feature or request index This issue is related to indexing labels May 26, 2020
@molikd
Copy link
Collaborator

molikd commented May 27, 2020

This ticket is related to #55 going to close #55 in favor of this more succinct ticket

here are tickets for things that you mentioned, where I could find them:

@molikd molikd added this to Triage in The Reader Meets COVID-19 via automation May 27, 2020
@molikd molikd added the COVID-19 this issue is top priority because of COVID-19 label May 27, 2020
@molikd molikd moved this from Triage to Tasks in The Reader Meets COVID-19 May 29, 2020
@molikd molikd moved this from Tasks to In Progress in The Reader Meets COVID-19 May 29, 2020
@ericleasemorgan
Copy link
Owner

To the best of my ability, things have been "ticket-ified".

@ericleasemorgan ericleasemorgan moved this from In Progress to Done in The Reader Meets COVID-19 Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
COVID-19 this issue is top priority because of COVID-19 data This issue is related to data enhancement New feature or request index This issue is related to indexing
Projects
Development

No branches or pull requests

3 participants