Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Open Ethnographer to work with Discourse #10

Closed
albertocottica opened this issue Jul 12, 2017 · 8 comments
Closed

Implement Open Ethnographer to work with Discourse #10

albertocottica opened this issue Jul 12, 2017 · 8 comments
Assignees

Comments

@albertocottica
Copy link
Member

The goal is to have OpenEthnographer working inside Discourse, similarly to how it worked inside Drupal. Plus, to base it on a data model for codes and annotations that is fully prepared for the RECODE project.

Vocabulary:

  1. Contributions: units of text created by platform users. Includes: nodes + comments (in Drupal), topics + replies (in Discourse).
  2. Codes: keywords assigned by the ethnographers to parts of the data.
  3. Annotations: entities, created by ethnographers, that associate codes to snippets of text, themselves associated to contributions.

More on the vocabulary in the OpenCare Data Management Plan.

Contributions have:

  1. ID
  2. Title (Free text)
  3. Content (free text)
  4. Author ID
  5. timestamp
  6. group ID (Drupal) / category ID (Discourse). Not necessary for comments or replies, as they cannot be obtained by join on the node or topic.
  7. node ID (Drupal) / topic ID (Discourse). Not applicable for nodes and topics.
  8. parent ID. Applies to comments/replies that reply to other comments/replies.

Codes have:

  1. ID
  2. Name (Free Text)
  3. Description (Free Text)
  4. Author ID
  5. Timestamp
  6. Parent ID. The ID of the parent code in the hierarchy.

Note: the hierarchy of codes is not a tree. A code can be a child of two parents. This allows top-level parent codes to indicate studies (for example OpenCare, or Stewardship).

Annotations have:

  1. ID
  2. "Snippet". This is the meaningful part of the text that is associated to the code. Can coincide with the whole contributions. Always exactly one per annotation.
  3. Code IDs. Can be more than one.
  4. Contribution ID. Links back to the contribution, and through it to the thread, group/category etc.
  5. Author ID. Identifies the author of the Annotation (ethnographer), not the author of the contribution.
  6. Timestamp.
@albertocottica
Copy link
Member Author

The platform serves a screen with the full history and participation stats for each user. Example, for username moe: https://edgeryders.eu/u/moe/summary . There is also a summary version in the Users item in the hamburger menu. And it is wonderful to use: filterable, sortable by any statistic with one click.

There are obvious data structure implications. Implicitly, a network is being induced where the users are nodes, and there are two layers of edges: A likes-the-content of B and A replies-to-the-content-of B. This means that, in the data structure, there is a full users-to-content bipartite network. This could probably harnessed to make easy APIs geared towards exporting networks.

@guywiz
Copy link

guywiz commented Jul 25, 2017

Seems to me you need to add something to codes to better organize the hierarchy. That the hierarchy is not a tree is fine. But it seems that you plan to use codes of different nature in the hierarchy. As in your example, attaching opencare and stewardship as parent for a code. The two codes are not codes per se, they indicate a link between a code and objects (here projects).

So some codes would only be codes, other (necessary non leaf I guess) would be there as member of the taxonomy to enrich a code about its meaning, its scope, etc.

?

@tanius tanius changed the title Implement a RECODE-complete data model for Open Ethnographer codes and annotations Implement OpenEthnographer to work with Discourse Jul 25, 2017
@tanius
Copy link
Member

tanius commented Jul 25, 2017

Time plan: as agreed with Alberto, until early 2017-09 to make OpenEthnographer work with Discourse, until late 2017-10 to import the existing OpenEthnographer data from Drupal to Discourse.

Rough implementation plan:

  • There will be no implementation of any Drupal reverse import script, at all. Because Amelia's coding activity can wait until early / mid 2017-09 easily, and we don't want to throw away development effort.
  • First, test if the out-of-the-box Annotator.js component can or cannot be used with Discourse, or other Ember applications.
  • Create it as a Discourse plugin or other type of extension that provides (1) Annotator.js, (2) our Annotator.js backend in Ruby, (3) a per-user admin setting to enable Annotator.js usage for them (which also keeps the site at speed for everyone else).
  • It would be comfortable to just have to add a certain extension to a Discourse URL (like .oe) to arrive at a page variant with Annotator.js and the rest of the OpenEthnographer interface.
  • Should not be tied much to Discourse at all. Means: do not re-use the Discourse tags feature, instead create an own datastructure for it. All data should be save in its own tables. The way to make it detect topic and comment boundaries (by jQuery selector) should be configurable, not tied to Discourse. The intention is to make integration with other platforms simple enough lateron, if there is a need.
  • OpenEthnographer should get a simple administrative interface. It does not have to be integrated with Discourse at all, but can look like the existing /sidekiq or /logs interfaces.
  • Ideally, there should be login integration. Meaning, the /annotations page would only be accessible for users for whom OpenEthnographer has been enabled.
  • Annotator.js has a nice new top bar, see http://annotatorjs.org/ . This can be utilized as well.
  • Open Ethnographer should provide the same (or a very similar) JSON API as Drupal did, to export its data. This will make the Graphryder tool run without any changes needed, as it gets both the nodes / comments network and the OpenEthnographer annotations as JSON views from Drupal right now.
  • The importing of existing data will be a bit tricky, as word indexes can be off due to automatic content changes (however, except for comment titles, automatic changes should only have affected tags, so should be without effect on word indexes).

@tanius tanius self-assigned this Jul 25, 2017
@albertocottica
Copy link
Member Author

Great. One small facility will be needed to manage the hierarchy of codes.

@albertocottica
Copy link
Member Author

Note on tags and utilities to research work.

  • Sometimes a staff member finds content that is relevant for an ongoing project, that is not in the same category as the project itself. Normally this would be older content that gets re-used in newer projects. A tag of the form project-PROJECTNAME is then assigned to the topics in question. The topics can be then aggregated for data analysis through API calls to /topics/PROJECTNAME.json.
  • Open Ethnographer needs a "queue" of content to be coded. A possible solution would be to implement a coded tag to assign to all topics that were coded. The queue for PROJECTNAME would then result by pulling in all content with the project-PROJECTNAME tag, but without the coded tag.
  • Problem: no documented put method for tags.
  • In the logic of reusing content and annotations from one project to the next, Discourse has a handy tags co-occurrence feature: "give me all topics that have both the coded and the solar-power tags". This would tell the researcher how much content has already been annotated on the problem she is investigating.

@tanius
Copy link
Member

tanius commented Aug 23, 2017

"give me all topics that have both the coded and the solar-power tags"

That won't work as we will not use the internal tag system of Discourse. Because Discourse tags have no proper hierarchy, no description, probably no author.

@tanius
Copy link
Member

tanius commented Aug 23, 2017

Here's a status report for @albertocottica etc.. @damingo does the development, I only participated in the software design and decision making, and it seems the documentation part also falls to me … 😆

Architecture. We decided to add Annotator.js to a custom variant of the "pure HTML" print version of the Discourse output. Only staff members (admins and moderators) will have access to that part of the interface, and see a button on each topic to switch to open the tagging interface for the topic. In addition, there will be a "native Ruby on Rails" admin interface for Open Ethnographer, not integrated with the usual Discourse one, available under something like http://edgeryders.eu/openethnographer. With this setup, we route around the need to integrate Annotator.js etc. with the "great but complex" Ember framework used for the Discourse client-side JavaScript application. That is, we cut the development effort in half and in addition get something that only depends on Ruby on Rails, not on Discourse.

Data structures. Data structures for codes and annotations will be as required above. They will live in their own database tables – means, Open Ethnographer codes are not Discourse tags. (Otherwise we'd need to hack away in Discourse core to count and show tag usage combined from topic tags and annotations, which seems a bad mess. Different concepts should get different implementations.) The only deviation from the specs above will be that the code hierarchy will indeed be a tree structure, as in the Drupal based Open Ethnographer. For the concept hierarchy, this seems to be the way to go. To accommodate the use case of potentially intersecting selections of tags for studies, there will be another feature called "code collections", each being a set of zero or more codes. These will also be the units on which data export features will work (which will be discussed in a separate issue).

Time plan. The schedule looks achievable so far – the Discourse based Open Ethnographer should be ready by 2017-09-05 or earlier. The original taggings should also be imported by then, but only those we can match automatically to their new positions in the corresponding Discourse post. This should be >95% of them, as we can search for the annotation text ("quote", redundantly saved) to anchor the annotation in the new text. (The remaining annotations will be manually transferred by @anuzement, for which there is no clear timeplan yet.)

@tanius
Copy link
Member

tanius commented Aug 28, 2017

The basic implementation of "Open Ethnographer for Discourse" is done and deployed on edgeryders.eu, see e22c554.

Please open separate issues for missing features and bugs.

@tanius tanius closed this as completed Aug 28, 2017
@tanius tanius reopened this Aug 28, 2017
@tanius tanius closed this as completed Aug 28, 2017
@tanius tanius changed the title Implement OpenEthnographer to work with Discourse Implement Open Ethnographer to work with Discourse Oct 7, 2017
@tanius tanius transferred this issue from edgeryders/discourse Sep 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants