Merge pull request #60 from navigating-stories/docu_update

Updates docs
navigating-stories · May 7, 2024 · e0a90c6 · e0a90c6
2 parents 5f43031 + a766d18
commit e0a90c6
Show file tree

Hide file tree

Showing 11 changed files with 150 additions and 16 deletions.
diff --git a/doc/widgets/actionanalysis.md b/doc/widgets/actionanalysis.md
@@ -3,19 +3,42 @@ Actions
 
 ![](../../orangecontrib/storynavigation/widgets/icons/action_analysis_icon.png)
 
-An Orange3 widget to highlight present and past tense actions, calculate their frequency, and identify actors associated with those actions, within in a textual story written in the Dutch language.
+The *Action Analysis* widget provides a tool to support basic narrative analysis for actions in stories. It is part of the Orange3-Story-Navigator add-on for the Orange data mining software package. The widget highlights present and past tense actions, calculate their frequency, and identify actors associated with those actions, within in a textual story written in the Dutch language.
+
+**Main Features**
+- Analyses actions in stories based on part of speech (POS) tagging and verb tense.
+- Allows selection of entity type to highlight (POS or NER).
+- Provides options to filter and highlight specific parts of speech.
+- Supports copying of analysis results to clipboard.
+- Allows for customization of POS checkboxes for each POS type.
 
 **Inputs**
 
-- Corpus: A dataset of one or more textual story documents in Dutch.
-- Token categories (**optional**): a data table specifying one or more classification schemes of tokens or words. The table should consist of at least two columns. The first column is a list of words or tokens. All subsequent columns should contain strings which represent user-defined category labels for the corresponding word or token in the first column.
+- *Story elements*: The action widget always requires story elements from the elements widget. The widget will not work without this input.
+- *Stories*: A dataset of one or more textual story documents in Dutch.
+- *Token categories* (**optional**): a data table specifying one or more classification schemes of tokens or words. The table should consist of at least two columns. The first column is a list of words or tokens. All subsequent columns should contain strings which represent user-defined category labels for the corresponding word or token in the first column.
 
 **Outputs**
 
-- Frequency: A data table with exactly two columns. The first column is a list of actions mentioned in the input story. The second column is the number of times that action is mentioned in the story.
-- Tense frequency: A data table with exactly two columns and two rows. The first column consists of two cells with string values "Past tense" and "Present tense". The second column contains the frequency (raw counts of actions belonging to each category in the first column).
-- Custom tag stats (**optional**): A data table with exactly two columns. The first column is a list of words or tokens specified by the user. The second column is the number of times that word or token is mentioned in the story.
-- Actor action table: A data table with exactly two columns. The first column is a list of actors mentioned in the input story. The second column is a comma-separated string where each token in between the commas represents an action which that corresponding was involved in in the context of the story.
+- *Action stats*: A data table with four columns. 
+  - *storyid* is the first column, matching with a particular story from the corpus
+  - *segment_id* represents the amount of segments the story has been divided into (see the elements widget)
+  - *story navigator tag* describes the verb tense, indicating the time at which an action takes place (i.e., past, present, or future) 
+  - *wordcol* describes the frequency of a verb tense per story id.
+
+- *Custom tag stats* (**optional**): A data table with five columns.
+  - *storyid*, matching with a particular story from the corpus
+  - *segment_id* represents the amount of segments the story has been divided into
+  - *category* further sub-categorizes all the verbs in a story, with the type of sub-category depending on the  column 'classification'. The specific (sub)-categories can be manually specified, depending on the research interests, and is input for the elements widget. Note that a verb can be part of more than 1 category, depending on the context and quality of the verb.
+  - *freq* describes the verb-frequency within each subcategory (i.e., column category), per story id.
+  - *classification* is the higher-order category, as manually specified.
+
+- *Action table*: A data table with three columns. 
+  - *action* specifies all the verbs which occured accross the corpus. Duplicate verbs occur because the action table accounts for different type of entities associated with the action.
+  - *entities* specifies all the actors associated with the action from the action column, accross the entire corpus.
+  - *entities_type* further specifies the association between action and entity, based on the entity's morphological property (e.g., singular proper noun, noun that is singular and non-proper, etc.).
+
+
 
 Example usage:
 --------------

diff --git a/doc/widgets/actoranalysis.md b/doc/widgets/actoranalysis.md
@@ -5,20 +5,40 @@ Actors
 
 An Orange3 widget to highlight the main subjects of sentences as well as other potential actors or characters in a textual story written in the Dutch language.
 
+**Main Features**
+- POS Tagging: Perform part-of-speech (POS) tagging on the text of stories, tagging noun tokens and subject tokens as per user specifications.
+- Custom Tagging: Highlight custom tags specified by the user within the story text.
+- HTML Output: Returns HTML string representations of the POS tagged text, ready for rendering in the UI.
+- Frequency Calculation: Prepares data tables for frequencies of custom tokens specified by the user.
+- Actor Analysis Results: Generates actor analysis results including raw frequency, subject frequency, agency, and prominence score for each identified actor.
+
 **Inputs**
 
-- Corpus: A dataset of one or more textual story documents in Dutch.
-- Token categories (**optional**): a data table specifying one or more classification schemes of tokens or words. The table should consist of at least two columns. The first column is a list of words or tokens. All subsequent columns should contain strings which represent user-defined category labels for the corresponding word or token in the first column.
+- *Stories*: A corpus or dataset of one or more textual story documents in Dutch.
+
+- *Story elements*: The action widget requires story elements from the elements widget. These story elements refer to the attributes extracted from the textual stories, including linguistic features, such as words, sentences, parts of speech, and syntactic structures, along with additional metadata to contextualize the analysis. These elements enable the actor widget to perform detailed natural language processing and semantic analysis on the textual stories, facilitating the identification and characterization of actors, actions, and relationships within the narratives.
 
 **Outputs**
 
-- Frequency: A data table with exactly two columns. The first column is a list of actors mentioned in the input story. The second column is the number of times that actor is mentioned in the story.
-- Frequency as subject: A data table with exactly two columns. The first column is a list of actors mentioned in the input story. The second column is the number of times that actor is mentioned as the main subject of a sentence in the story.
-- Custom token frequency (**optional**): A data table with exactly two columns. The first column is a list of words or tokens specified by the user. The second column is the number of times that word or token is mentioned in the story.
-- Agency: A data table with exactly two columns. The first column is a list of actors mentioned in the input story. The second column is a score for that actor representing its [agency](https://journals.sagepub.com/doi/full/10.1177/0081175012462370?casa_token=Lx4o-GJ8wbAAAAAA%3AbolGvtXBrf_Wa84jvVSd02kCt4rXwCGs108iqHk0LoXo1nRMPKnsZwhumUtArpnk_hvJzNiyO7nL5w) in the story
-
-In all output data tables above, only the top 20 scores are given for each metric.
-
+- *Custom tag stats* (**optional**): A data table with five columns.
+  - *storyid*, matching with a particular story from the corpus
+  - *segment_id* represents the amount of segments the story has been divided into
+  - *category* further sub-categorizes all the verbs in a story, with the type of sub-category depending on the  column 'classification'. The specific (sub)-categories can be manually specified, depending on the research interests, and is input for the elements widget. Note that a verb can be part of more than 1 category, depending on the context and quality of the verb.
+  - *freq* describes the verb-frequency within each subcategory (i.e., column category), per story id.
+  - *classification* is the higher-order category, as manually specified and input for the elements widget.  
+
+- *Actor stats*: The elements widget generates a data table containing tagging data for all stories processed. Each row in the datatable represents a tagged token within a sentence, providing comprehensive information for further analysis and interpretation. It includes the following columns:
+
+  - *token_text_lowercase*: The text of the token.
+  - *storyid*: Unique identifier for the story.
+  - *segment_id*: Identifier for the story segment.
+  - *raw_freq*: Raw frequency of the custom token in the story segment.
+  - *subj_freq*: Subject frequency of the custom token in the story segment.
+  - *agency*: of the custom token in the story segment represents the extent of an entity's involvement or influence within the narrative. [Agency](https://journals.sagepub.com/doi/full/10.1177/0081175012462370?casa_token=Lx4o-GJ8wbAAAAAA%3AbolGvtXBrf_Wa84jvVSd02kCt4rXwCGs108iqHk0LoXo1nRMPKnsZwhumUtArpnk_hvJzNiyO7nL5w) measures how actively a particular entity (such as a character, organization, or concept) is engaged in actions or events described in the story. Higher agency values indicate that the entity is more actively involved in driving the narrative forward.
+    - The agency is calculated as the ratio of the total occurrences of the entity being the subject of sentences to the total number of sentences in which the entity appears.
+  - *prominence_sf*: is the prominence score of the custom token in the story segment, measuring the entity's significance relative to others in the story. Higher prominence scores suggest that the entity plays a more crucial or central role in the narrative.
+    - The prominence score is calculated based on the relative frequency of the entity's appearance in subject positions across sentences in the story. It considers both the frequency of occurrence and the distribution of the entity's mentions throughout the narrative. The prominence score calculation involves normalization to account for variations in story length and ensures comparability across different stories.
+
 Example usage:
 --------------
 

diff --git a/doc/widgets/elements.md b/doc/widgets/elements.md
@@ -0,0 +1,56 @@
+Elements
+=======
+
+![](../../orangecontrib/storynavigation/widgets/icons/tagger_icon.png)
+
+The *Elements Analysis* widget is part of the Orange Story Navigator add-on, designed for Natural Language Processing (NLP) tagging of actors and actions in textual stories. It serves as a tool for extracting relevant information from stories, particularly useful in the context of narrative analysis and text mining.
+
+**Main Features**
+- NLP tagging: Utilizes Spacy's natural language processing capabilities to analyze and tag text, identifying actors (subjects) and actions within sentences.
+- Custom tagging: Supports the incorporation of custom tags and word columns for tailored analysis, allowing users to define specific categories for tagging.
+- Language support: Available in multiple languages, including English and Dutch, with support for additional languages in future updates.
+- Segment analysis: Divides stories into segments for more granular analysis, enabling users to examine tagging patterns within specific sections of text.
+- Error handling: Implements robust error handling mechanisms to ensure smooth processing even in the presence of unexpected inputs or issues.
+
+**Inputs**
+
+- *Custom tags* (**optional**): allow users to define and highlight specific categories or entities in the text based on their requirements or domain expertise. These custom tags are typically user-defined labels that represent meaningful concepts or entities within the stories. 
+  - Users can define custom tags to identify entities such as named entities, key concepts, thematic categories, sentiment indicators, or any other domain-specific elements of interest.
+  - Custom tags can for example be imported as a csv file via *file* from the Data module
+  - By incorporating custom tags into the analysis, users can gain deeper insights into the textual content and extract meaningful information tailored to their specific needs.
+
+- *Stories*: The text of a story to be analyzed. The module will not work without this input. It could be any textual content, such as a news article, a blog post, a social media post, or any other form of written text. The widget assigns a unique identifier to the story, distinguishing one story from another within the corpus of stories.
+  - Widget options:
+    - Language: Specifies the language of the input stories, currently supporting 'en' (English) and 'nl' (Dutch).
+    - Number of Segments: Determines the number of segments into which each story will be split for analysis.
+
+
+**Outputs**
+
+- *Story elements*: The elements widget generates a data table containing tagging data for all stories processed. Each row in the datatable represents a tagged token within a sentence, providing comprehensive information for further analysis and interpretation. It includes the following columns:
+
+  - sentence: The text of the sentence within the story.
+  - token_text: The text of the token within the sentence.
+  - token_text_lowercase: The lowercase version of the token text.
+  - index: The index of the token within the sentence.
+  - story_id: An identifier for the story to which the sentence belongs.
+  - token_start_idx: The starting index of the token within the sentence.
+  - token_end_idx: The ending index of the token within the sentence.
+  - story_navigator_tag: The assigned tag for the token based on its role in the sentence.
+  - spacy_Tag: The coarse-grained part-of-speech (POS) tag of the token.
+  - spacy_finegrained_tag: The fine-grained POS tag of the token.
+  - spacy_dependency: The syntactic linguistic dependency relation of the token.
+  - is_pronoun_boolean: Indicates whether the token is a pronoun (True or False).
+  - is_sentence_subject_boolean: Indicates whether the token is a subject of its sentence (True or False).
+  - active_voice_subject_boolean: Indicates whether the token is involved in an active voice subject role in the sentence (True or False).
+  - associated_action: The associated action or verb corresponding to the token.
+  - sentence_id: A unique identifier for each sentence within a story.
+  - segment_id: A numerical identifier indicating the segment to which the sentence belongs, based on the specified number of segments to split each story into.
+  - associated_Action_lowercase: The lowercase version of the associated action.
+  - lang: The language of the sentence.
+  - num_words_in_sentence: The number of words in the sentence.
+
+Example usage:
+--------------
+
+![](images/sn_action_analysis_example.png)
diff --git a/doc/widgets/images/Orange_menu.png b/doc/widgets/images/Orange_menu.png
diff --git a/doc/widgets/images/download.png b/doc/widgets/images/download.png
diff --git a/doc/widgets/images/install_addon.png b/doc/widgets/images/install_addon.png
diff --git a/doc/widgets/images/orange_install.png b/doc/widgets/images/orange_install.png
diff --git a/doc/widgets/images/restart.png b/doc/widgets/images/restart.png
diff --git a/doc/widgets/images/storynavigator_logo.png b/doc/widgets/images/storynavigator_logo.png
diff --git a/doc/widgets/images/tagger_icon.png b/doc/widgets/images/tagger_icon.png
diff --git a/doc/widgets/install.md b/doc/widgets/install.md
@@ -0,0 +1,35 @@
+How to enable StoryNavigator in Orange
+=======
+![](../../doc/widgets/images/storynavigator_logo.png)
+
+
+To install the Orange3 StoryNavigator Add-on, use the following steps.
+
+**Install and start Orange**
+
+First, make sure to have installed the latest version of Orange3. If you haven't installed Orange3 yet, you can download it from the [Orange website](https://orange.biolab.si/download/).
+
+![](../../doc/widgets/images/download.png)
+
+
+**Open Add-on manager**
+
+Additional features can be added to Orange by installing add-ons. After having opened Orange, go to the *options* menu at the top of the screen and select the *Add-ons* option. A dialog will open that will list and describe existing add-ons.
+
+![](../../doc/widgets/images/orange_menu.png)
+
+
+**Add storynavigator**
+
+In the menu that appears, select *Add more*. Here, you can add an add-on by typing its name. Type *storynavigator* in the search bar, so that storynavigator appears in the list of add-ons. Finally, click the checkbox before storynavigator and finally click the *OK* button.
+
+![](../../doc/widgets/images/install_addon.png)
+
+You will see a progress bar as the add-on is being installed (it may take a while, up to 10-15 minutes). When the installation is
+completed you will see a dialog asking you to restart Orange. Click the *Restart* button.
+
+![](../../doc/widgets/images/restart.png)
+
+This is not an automatic restart. You must close Orange and reopen the program.
+
+**This completes your installation of Orange and the StoryNavigator Add-on.**