Setting analysis widget #14

kodymoodley · 2024-01-04T12:38:43Z

Implement one feature for analysing the setting of a story:

One approach could be to obtain a list of keywords / uniquely identifying words from the story, say 'kw'.
Thereafter, we could find the 'closest' N words to each word in 'kw' within a pretrained embedding space for Dutch
The cluster(s) of these words (t-sne) could be rendered to the screen to inform the setting

f-hafner · 2024-01-29T10:06:21Z

We defined the following subtasks:

start from corpus of stories
remove stopwords
lemmatize
put into a dataframe together with storyid and segment id
prepare embeddings: @kodymoodley finds out which model to use
extract similar words in embedding space

f-hafner · 2024-01-29T15:32:19Z

Questions to discuss

I am reusing the spacy model loaded for other tasks. is this ok here?
- for instance, the "merge_noun_chunks" is added to the nlp model. Then, "Mijn eerste vriendje" becomes ["mijn een vriendje"]; if this is not added, we have ["mijn", "een", "vriendje"]
refactoring
- structure between tagger and setting analyzer are now quite similar, maybe we can think of combining them?
- test the function util.is_valid_token(); reuse in tagging.py

kodymoodley · 2024-02-15T07:48:08Z

We defined the following subtasks:

start from corpus of stories

remove stopwords

lemmatize

put into a dataframe together with storyid and segment id

prepare embeddings: @kodymoodley finds out which model to use

extract similar words in embedding space

Thanks very much @f-hafner ! This is already super helpful to have completed the preprocessing. The lead applicants have recently informed me that they would like to pause on the Setting widget until after the workshop. So this feature is no longer required for the workshop in April. But I / we could resume where you left off after the workshop.

kodymoodley · 2024-02-15T07:49:42Z

Questions to discuss

I am reusing the spacy model loaded for other tasks. is this ok here?

for instance, the "merge_noun_chunks" is added to the nlp model. Then, "Mijn eerste vriendje" becomes ["mijn een vriendje"]; if this is not added, we have ["mijn", "een", "vriendje"]

refactoring

structure between tagger and setting analyzer are now quite similar, maybe we can think of combining them?

test the function util.is_valid_token(); reuse in tagging.py

@f-hafner, will revisit this comment in April / May. Right now, I suspect that merging the noun chunkswould not be necessary for what we want to do.

kodymoodley added the enhancement New feature or request label Jan 4, 2024

kodymoodley self-assigned this Jan 4, 2024

kodymoodley assigned f-hafner Jan 29, 2024

f-hafner mentioned this issue Jan 29, 2024

Tupelize corpus #32

Draft

3 tasks

f-hafner linked a pull request Jan 29, 2024 that will close this issue

Setting widget #33

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting analysis widget #14

Setting analysis widget #14

kodymoodley commented Jan 4, 2024

f-hafner commented Jan 29, 2024 •

edited

f-hafner commented Jan 29, 2024 •

edited

kodymoodley commented Feb 15, 2024

kodymoodley commented Feb 15, 2024

Setting analysis widget #14

Setting analysis widget #14

Comments

kodymoodley commented Jan 4, 2024

f-hafner commented Jan 29, 2024 • edited

f-hafner commented Jan 29, 2024 • edited

kodymoodley commented Feb 15, 2024

kodymoodley commented Feb 15, 2024

f-hafner commented Jan 29, 2024 •

edited

f-hafner commented Jan 29, 2024 •

edited