Skip to content

Commit

Permalink
Merge pull request #147 from meghdadFar/release
Browse files Browse the repository at this point in the history
Release 1.4.0
  • Loading branch information
meghdadFar committed Apr 8, 2024
2 parents cd51c91 + ca6b7b9 commit 8825baf
Show file tree
Hide file tree
Showing 21 changed files with 131 additions and 28 deletions.
Binary file modified docs/.doctrees/api.doctree
Binary file not shown.
Binary file modified docs/.doctrees/chat.doctree
Binary file not shown.
Binary file modified docs/.doctrees/mwes.doctree
Binary file not shown.
Binary file added docs/_images/chat_mwe.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_images/chat_stats.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 40 additions & 3 deletions docs/_sources/chat.rst.txt
Expand Up @@ -19,14 +19,51 @@ call the `chat` method to interact with the data and get insights from it via Na
from wordview.text_analysis import TextStatsPlots
imdb_df = pd.read_csv("data/IMDB_Dataset_sample_5k.csv")
with open("wordview/chat/secrets/openai_api_key.json", "r") as f:
with open("your_secrets_dir/openai_api_key.json", "r") as f:
credentials = json.load(f)
tsp = TextStatsPlots(df=imdb_df, text_column="review")
tsp.chat(api_key=credentials.get("openai_api_key"))
The chat UI is available under http://127.0.0.1:5000/

|chat|
|chat_stats|

.. |chat| image:: ../figs/chat.png
Chat with MWEs
~~~~~~~~~~~~~~

After allowing Wordview to extract MWEs, you can call the `chat` method to get insights from this extraction through Natural Language.

.. code:: python
import json
import pandas as pd
from wordview.mwe_extraction import MWEs
from wordview.preprocessing import NgramExtractor
imdb_df = pd.read_csv("data/IMDB_Dataset_sample_5k.csv")
with open("your_secrets_dir/openai_api_key.json", "r") as f:
credentials = json.load(f)
extractor = NgramExtractor(imdb_df, "review")
extractor.extract_ngrams()
extractor.get_ngram_counts(ngram_count_file_path="ngram_counts.json")
mwe_obj = MWE(imdb_df, 'review',
ngram_count_file_path='ngram_counts.json',
language='EN',
custom_patterns="NP: {<DT>?<JJ>*<NN>}",
only_custom_patterns=False,
)
mwe_obj.extract_mwes(sort=True, top_n=10)
mwe_obj.chat(api_key=credentials.get("openai_api_key"))
The chat UI for MWEs is available under http://127.0.0.1:5001/

|chat_mwe|

.. |chat_stats| image:: ../figs/chat_stats.png

.. |chat_mwe| image:: ../figs/chat_mwe.png
12 changes: 8 additions & 4 deletions docs/_sources/mwes.rst.txt
Expand Up @@ -32,16 +32,20 @@ the documentation.
custom_patterns="NP: {<DT>?<JJ>*<NN>}",
only_custom_patterns=False,
)
mwes = mwe_obj.extract_mwes(sort=True, top_n=10)
json.dump(mwes, open('data/mwes.json', 'w'), indent=4)
mwe_obj.extract_mwes(sort=True, top_n=10)
json.dump(mwe_obj.mwes, open('data/mwes.json', 'w'), indent=4)
The above returns the results in a dictionary, that in this example we stored in `mwes.json` file.
The above returns the results in a dictionary, that in this example we stored in a json file called `data/mwes.json`.
You can also return the result in a table:

.. code-block:: python
mwe_obj.print_mwe_table()
Which will return a table like this:

.. code-block:: text
╔═════════════════════════╦═══════════════╗
║ LVC ║ Association ║
╠═════════════════════════╬═══════════════╣
Expand Down
29 changes: 24 additions & 5 deletions docs/api.html

Large diffs are not rendered by default.

36 changes: 34 additions & 2 deletions docs/chat.html
Expand Up @@ -60,6 +60,7 @@
<li class="toctree-l1"><a class="reference internal" href="clustering.html">Cluster Analysis</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Chat with Wordview</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#chat-with-textstatsplots">Chat with TextStatsPlots</a></li>
<li class="toctree-l2"><a class="reference internal" href="#chat-with-mwes">Chat with MWEs</a></li>
</ul>
</li>
</ul>
Expand Down Expand Up @@ -115,15 +116,46 @@ <h2>Chat with TextStatsPlots<a class="headerlink" href="#chat-with-textstatsplot

<span class="kn">from</span> <span class="nn">wordview.text_analysis</span> <span class="kn">import</span> <span class="n">TextStatsPlots</span>
<span class="n">imdb_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s2">&quot;data/IMDB_Dataset_sample_5k.csv&quot;</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;wordview/chat/secrets/openai_api_key.json&quot;</span><span class="p">,</span> <span class="s2">&quot;r&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;your_secrets_dir/openai_api_key.json&quot;</span><span class="p">,</span> <span class="s2">&quot;r&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">credentials</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>

<span class="n">tsp</span> <span class="o">=</span> <span class="n">TextStatsPlots</span><span class="p">(</span><span class="n">df</span><span class="o">=</span><span class="n">imdb_df</span><span class="p">,</span> <span class="n">text_column</span><span class="o">=</span><span class="s2">&quot;review&quot;</span><span class="p">)</span>
<span class="n">tsp</span><span class="o">.</span><span class="n">chat</span><span class="p">(</span><span class="n">api_key</span><span class="o">=</span><span class="n">credentials</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;openai_api_key&quot;</span><span class="p">))</span>
</pre></div>
</div>
<p>The chat UI is available under <a class="reference external" href="http://127.0.0.1:5000/">http://127.0.0.1:5000/</a></p>
<p><img alt="chat" src="_images/chat.png" /></p>
<p><img alt="chat_stats" src="_images/chat_stats.png" /></p>
</section>
<section id="chat-with-mwes">
<h2>Chat with MWEs<a class="headerlink" href="#chat-with-mwes" title="Permalink to this heading"></a></h2>
<p>After allowing Wordview to extract MWEs, you can call the <cite>chat</cite> method to get insights from this extraction through Natural Language.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">json</span>

<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>

<span class="kn">from</span> <span class="nn">wordview.mwe_extraction</span> <span class="kn">import</span> <span class="n">MWEs</span>
<span class="kn">from</span> <span class="nn">wordview.preprocessing</span> <span class="kn">import</span> <span class="n">NgramExtractor</span>

<span class="n">imdb_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s2">&quot;data/IMDB_Dataset_sample_5k.csv&quot;</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;your_secrets_dir/openai_api_key.json&quot;</span><span class="p">,</span> <span class="s2">&quot;r&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">credentials</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>

<span class="n">extractor</span> <span class="o">=</span> <span class="n">NgramExtractor</span><span class="p">(</span><span class="n">imdb_df</span><span class="p">,</span> <span class="s2">&quot;review&quot;</span><span class="p">)</span>
<span class="n">extractor</span><span class="o">.</span><span class="n">extract_ngrams</span><span class="p">()</span>
<span class="n">extractor</span><span class="o">.</span><span class="n">get_ngram_counts</span><span class="p">(</span><span class="n">ngram_count_file_path</span><span class="o">=</span><span class="s2">&quot;ngram_counts.json&quot;</span><span class="p">)</span>

<span class="n">mwe_obj</span> <span class="o">=</span> <span class="n">MWE</span><span class="p">(</span><span class="n">imdb_df</span><span class="p">,</span> <span class="s1">&#39;review&#39;</span><span class="p">,</span>
<span class="n">ngram_count_file_path</span><span class="o">=</span><span class="s1">&#39;ngram_counts.json&#39;</span><span class="p">,</span>
<span class="n">language</span><span class="o">=</span><span class="s1">&#39;EN&#39;</span><span class="p">,</span>
<span class="n">custom_patterns</span><span class="o">=</span><span class="s2">&quot;NP: {&lt;DT&gt;?&lt;JJ&gt;*&lt;NN&gt;}&quot;</span><span class="p">,</span>
<span class="n">only_custom_patterns</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">mwe_obj</span><span class="o">.</span><span class="n">extract_mwes</span><span class="p">(</span><span class="n">sort</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">top_n</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">mwe_obj</span><span class="o">.</span><span class="n">chat</span><span class="p">(</span><span class="n">api_key</span><span class="o">=</span><span class="n">credentials</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;openai_api_key&quot;</span><span class="p">))</span>
</pre></div>
</div>
<p>The chat UI for MWEs is available under <a class="reference external" href="http://127.0.0.1:5001/">http://127.0.0.1:5001/</a></p>
<p><img alt="chat_mwe" src="_images/chat_mwe.png" /></p>
</section>
</section>

Expand Down
6 changes: 5 additions & 1 deletion docs/genindex.html
Expand Up @@ -153,8 +153,12 @@ <h2 id="B">B</h2>
<h2 id="C">C</h2>
<table style="width: 100%" class="indextable genindextable"><tr>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="api.html#wordview.text_analysis.TextStatsPlots.chat">chat() (wordview.text_analysis.TextStatsPlots method)</a>
<li><a href="api.html#wordview.mwes.MWE.chat">chat() (wordview.mwes.MWE method)</a>

<ul>
<li><a href="api.html#wordview.text_analysis.TextStatsPlots.chat">(wordview.text_analysis.TextStatsPlots method)</a>
</li>
</ul></li>
</ul></td>
<td style="width: 33%; vertical-align: top;"><ul>
<li><a href="api.html#wordview.clustering.cluster.Cluster">Cluster (class in wordview.clustering.cluster)</a>
Expand Down
14 changes: 9 additions & 5 deletions docs/mwes.html
Expand Up @@ -58,6 +58,7 @@
<li class="toctree-l1"><a class="reference internal" href="bias.html">Bias Analysis</a></li>
<li class="toctree-l1"><a class="reference internal" href="anomalies.html">Analysis of Anomalies &amp; Outliers</a></li>
<li class="toctree-l1"><a class="reference internal" href="clustering.html">Cluster Analysis</a></li>
<li class="toctree-l1"><a class="reference internal" href="chat.html">Chat with Wordview</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Utilities</span></p>
<ul>
Expand Down Expand Up @@ -126,14 +127,17 @@ <h1>Analysis &amp; Extraction of Multiword Expressions (MWEs)<a class="headerlin
<span class="n">custom_patterns</span><span class="o">=</span><span class="s2">&quot;NP: {&lt;DT&gt;?&lt;JJ&gt;*&lt;NN&gt;}&quot;</span><span class="p">,</span>
<span class="n">only_custom_patterns</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">mwes</span> <span class="o">=</span> <span class="n">mwe_obj</span><span class="o">.</span><span class="n">extract_mwes</span><span class="p">(</span><span class="n">sort</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">top_n</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">json</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">mwes</span><span class="p">,</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;data/mwes.json&#39;</span><span class="p">,</span> <span class="s1">&#39;w&#39;</span><span class="p">),</span> <span class="n">indent</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
<span class="n">mwe_obj</span><span class="o">.</span><span class="n">extract_mwes</span><span class="p">(</span><span class="n">sort</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">top_n</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="n">json</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="n">mwe_obj</span><span class="o">.</span><span class="n">mwes</span><span class="p">,</span> <span class="nb">open</span><span class="p">(</span><span class="s1">&#39;data/mwes.json&#39;</span><span class="p">,</span> <span class="s1">&#39;w&#39;</span><span class="p">),</span> <span class="n">indent</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
</pre></div>
</div>
<p>The above returns the results in a dictionary, that in this example we stored in <cite>mwes.json</cite> file.
<p>The above returns the results in a dictionary, that in this example we stored in a json file called <cite>data/mwes.json</cite>.
You can also return the result in a table:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span>mwe_obj.print_mwe_table()
╔═════════════════════════╦═══════════════╗
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">mwe_obj</span><span class="o">.</span><span class="n">print_mwe_table</span><span class="p">()</span>
</pre></div>
</div>
<p>Which will return a table like this:</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>╔═════════════════════════╦═══════════════╗
║ LVC ║ Association ║
╠═════════════════════════╬═══════════════╣
║ SHOOT the binding ║ 26.02 ║
Expand Down
Binary file modified docs/objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/searchindex.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pyproject.toml
@@ -1,6 +1,6 @@
[tool.poetry]
name = "wordview"
version = "1.3.0"
version = "1.4.0"
description = """Wordview is a Python package for Exploratory Data Analysis of text and provides many statistics about your data in the form of plots, tables, and descriptions allowing you to have both a high-level and detailed overview of your data."""
authors = ["meghdadFar <meghdad.farahmand@gmail.com>"]
include = ["CHANGES.rst"]
Expand Down
Binary file removed sphinx-docs/figs/chat.png
Binary file not shown.
Binary file added sphinx-docs/figs/chat_mwe.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added sphinx-docs/figs/chat_stats.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions sphinx-docs/source/chat.rst
Expand Up @@ -27,6 +27,7 @@ call the `chat` method to interact with the data and get insights from it via Na
The chat UI is available under http://127.0.0.1:5000/

|chat_stats|

Chat with MWEs
~~~~~~~~~~~~~~
Expand Down Expand Up @@ -61,7 +62,8 @@ After allowing Wordview to extract MWEs, you can call the `chat` method to get i
The chat UI for MWEs is available under http://127.0.0.1:5001/

|chat_mwe|

|chat|
.. |chat_stats| image:: ../figs/chat_stats.png

.. |chat| image:: ../figs/chat.png
.. |chat_mwe| image:: ../figs/chat_mwe.png
2 changes: 1 addition & 1 deletion wordview/chat_ui/chat.html
Expand Up @@ -34,7 +34,7 @@
} */
.message-container {
overflow-y: auto; /* Enables vertical scrolling */
max-height: 500px; /* Set a max-height that fits your design */
max-height: 850px; /* Set a max-height that fits your design */
padding: 10px;
margin-bottom: 10px;
width: 100%; /* Ensure it fills the container */
Expand Down
5 changes: 3 additions & 2 deletions wordview/mwes/mwe.py
Expand Up @@ -43,7 +43,7 @@ def __init__(
language: str = "EN",
custom_patterns: Optional[str] = None,
only_custom_patterns: bool = False,
mwe_frequency_threshold: int = 3,
mwe_frequency_threshold: int = 10,
association_threshold: float = 1.0,
) -> None:
"""Initializes a new instance of MWE class.
Expand All @@ -64,7 +64,7 @@ def __init__(
ADJP: {<RB|RBR|RBS>*<JJ>} # Adjective phrase
ADVP: {<RB.*>+<VB.*><RB.*>*} # Adverb phrase'''
only_custom_pattern: If True, only the custom pattern will be used to extract MWEs, otherwise, the default patterns will be used as well.
mwe_frequency_threshold: The minimum frequency of an MWE to be considered for extraction. Defaults to 3.
mwe_frequency_threshold: The minimum frequency of an MWE to be considered for extraction. Defaults to 10.
association_threshold: A threshold value for the association measure. Only MWEs with an association measure above this threshold will be returned.
Returns:
Expand Down Expand Up @@ -151,6 +151,7 @@ def chat(self, api_key: str = ""):
"MWE Type": "MWE instance 1": "Association measure", "MWE instance 2": "Association measure", ...\n
- There could be other custom types in which case you should just mention the dictionary key.\n
- Depending on a parameter N set by the user, each MWE type contains at most N instances. But it can contain less or even 0.
- Return the association measures that you read from the dictionary with only two decimal places.
"""
chat_history = [
{"role": "system", "content": base_content},
Expand Down
2 changes: 1 addition & 1 deletion wordview/text_analysis/wrapper.py
Expand Up @@ -118,7 +118,7 @@ def chat(self, api_key: str = ""):
------------------------------
{self.return_stats()}
\n\n
Answer the questions without adding According to or Based on to the Wordview Analysis dictionary.
Do NOT say according to Wordview Analysis dictionary.
"""
chat_history = [
{"role": "system", "content": base_content},
Expand Down

0 comments on commit 8825baf

Please sign in to comment.