Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

Support for Arabic & Category collection #109

Draft
wants to merge 138 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
138 commits
Select commit Hold shift + click to select a range
bf854b4
Merge pull request #86 from jsch8q/patch_pronunciation
ZOUHEIRBN Oct 6, 2023
1a07dfe
Fetching words from all arabic dialects
ZOUHEIRBN Oct 9, 2023
f19ea48
Extracting appendig entities
ZOUHEIRBN Oct 9, 2023
3cafcb1
Collecting language info
ZOUHEIRBN Oct 10, 2023
ff6c256
Merge branch 'Suyash458:master' into master
ZOUHEIRBN Oct 10, 2023
3346d79
Added saving to database
ZOUHEIRBN Oct 12, 2023
29edfa5
Merge branch 'master' of https://github.com/ZOUHEIRBN/WiktionaryParser
ZOUHEIRBN Oct 12, 2023
6bc0078
Implemented database word insertion
ZOUHEIRBN Oct 13, 2023
ee312be
Added load definitions to database table
ZOUHEIRBN Oct 13, 2023
52b0cbe
"Fixed" the definition_appendix insertion issue
ZOUHEIRBN Oct 15, 2023
760a206
Fixed appendix retrieval
ZOUHEIRBN Oct 16, 2023
8e86581
Sraping related_words (still related_index_issue)
ZOUHEIRBN Oct 17, 2023
9338d38
Parsing relationships from .nyms class elements
ZOUHEIRBN Oct 19, 2023
16320e4
Externalized parsing nyms
ZOUHEIRBN Oct 19, 2023
5cd3d4d
Fixed hashcode matching between relations and defs
ZOUHEIRBN Oct 19, 2023
0760406
Enabled saving relationships to database
ZOUHEIRBN Oct 19, 2023
95d791f
Externalized DB connection object
ZOUHEIRBN Oct 19, 2023
1b48449
Completed saving pipeline
ZOUHEIRBN Oct 19, 2023
ff1fe5b
Word2Word graph fetching from DB
ZOUHEIRBN Oct 21, 2023
cc9a1e0
bugfix in relationship parsing (original schema)
ZOUHEIRBN Oct 21, 2023
4cb5bf2
Removed debug info
ZOUHEIRBN Oct 22, 2023
a630261
External processing funcs in 'Collector.save_word'
ZOUHEIRBN Oct 22, 2023
6f988de
Enabled saving related words directly to database
ZOUHEIRBN Oct 22, 2023
2ddef8d
Set wordId as FK in relationships table
ZOUHEIRBN Oct 22, 2023
577ffaf
Externalized database creation query to file
ZOUHEIRBN Oct 23, 2023
d35e79d
Added mentions as relationshipType
ZOUHEIRBN Oct 23, 2023
a819833
Fixed inserting mentions to word table
ZOUHEIRBN Oct 23, 2023
c2bc870
Fixed null wikiUrls in directly collected words
ZOUHEIRBN Oct 23, 2023
cc81d39
Externalized saving to DB in collector.save_word
ZOUHEIRBN Oct 24, 2023
834d926
Fixed a bug in appendix retrieval
ZOUHEIRBN Oct 25, 2023
3caace0
Added script for filling empty words (beta)
ZOUHEIRBN Oct 25, 2023
c9645f6
Fixed fill_empty_words.py
ZOUHEIRBN Oct 25, 2023
9df3cb9
Added dataset uploading script
ZOUHEIRBN Oct 26, 2023
9fe80c5
Added a script that fetches dataset tokens from DB
ZOUHEIRBN Oct 26, 2023
ae0bb78
Added categories to 'Collector.save_word' function
ZOUHEIRBN Oct 26, 2023
adff4c0
Added new type of categories
ZOUHEIRBN Oct 26, 2023
0d5c984
Added graph visualization script
ZOUHEIRBN Oct 27, 2023
c1d9dfd
Embedded graph generation into Builder
ZOUHEIRBN Oct 27, 2023
4daa2fa
Added kwargs to graph builder script
ZOUHEIRBN Oct 27, 2023
67e72bc
Updated layout of graph viz interface
ZOUHEIRBN Oct 27, 2023
9508014
Added categories as nodes in the rendered graph
ZOUHEIRBN Oct 27, 2023
ce818f7
Added def2word function
ZOUHEIRBN Oct 30, 2023
1dff244
Added filter param to Builder.build_graph
ZOUHEIRBN Oct 30, 2023
2fc33b5
Added graph generators for DGL functions
ZOUHEIRBN Oct 30, 2023
85a773c
Implemented adding node features to graph
ZOUHEIRBN Oct 30, 2023
36603b5
Integrated node_ids as member of Builder
ZOUHEIRBN Oct 31, 2023
f86aac8
Renamed Builder class to GraphBuilder
ZOUHEIRBN Oct 31, 2023
b3acfd8
Removed heterograph func, finished homograph func
ZOUHEIRBN Oct 31, 2023
5ed5ae7
Heterograph is working (beta)
ZOUHEIRBN Nov 23, 2023
984bf89
Added tailPOS to def2word (still NULL)
ZOUHEIRBN Nov 23, 2023
cbbab7b
Fixed heterograph generation
ZOUHEIRBN Nov 26, 2023
22dbe63
Added weights to heterograph
ZOUHEIRBN Nov 26, 2023
baa375c
Deorphanization (beta)
ZOUHEIRBN Nov 28, 2023
7fefab7
Fixed deorphanization code
ZOUHEIRBN Nov 29, 2023
7a441f8
Set deorphanize and get_word_info as functions
ZOUHEIRBN Nov 29, 2023
25f5ba4
Made interactive graph visualization code (beta)
ZOUHEIRBN Nov 30, 2023
c7c7da3
Made visualization of Graph script
ZOUHEIRBN Nov 30, 2023
ba82ac0
Extracting examples for each definition
ZOUHEIRBN Dec 3, 2023
a315017
Extracting examples for each definition
ZOUHEIRBN Dec 3, 2023
80b5b3e
Extracting subdefinition
ZOUHEIRBN Dec 3, 2023
b4b266c
Extracting subdefinition
ZOUHEIRBN Dec 3, 2023
4975b3a
Save examples to database
ZOUHEIRBN Dec 4, 2023
5878c57
Bug fix
ZOUHEIRBN Dec 4, 2023
27b4155
Refactoring
ZOUHEIRBN Dec 4, 2023
3a405f3
Refactoring
ZOUHEIRBN Dec 4, 2023
f5d71ba
Gropued pipeline in single main.py file
ZOUHEIRBN Dec 4, 2023
ac1add7
Gropued pipeline in single main.py file (beta)
ZOUHEIRBN Dec 4, 2023
849bd2d
Added wait time for data collection scripts
ZOUHEIRBN Dec 4, 2023
d6ce77b
Added graph export to pipeline
ZOUHEIRBN Dec 4, 2023
8eb4200
Added graph export to pipeline
ZOUHEIRBN Dec 4, 2023
4170490
minor changes
ZOUHEIRBN Dec 4, 2023
11c327c
Fixed appendix regex issue in core.py
ZOUHEIRBN Dec 6, 2023
68ae16c
Reverted faulty data storage optimization
ZOUHEIRBN Dec 8, 2023
3ffe904
Working version 2 (SQL Database)
ZOUHEIRBN Dec 11, 2023
7f4cbf5
Added backup to csv function
ZOUHEIRBN Dec 11, 2023
6f28bde
Fixed unnormalized arabic punctuation
ZOUHEIRBN Dec 11, 2023
780f35b
Fixed unnormalized arabic punctuation
ZOUHEIRBN Dec 11, 2023
c85d764
Added features to Preprocessor
ZOUHEIRBN Dec 12, 2023
51053ea
Fixed bug in collect_info function
ZOUHEIRBN Dec 13, 2023
99e5a3f
Added save_mentions param to deorphanize
ZOUHEIRBN Dec 13, 2023
8cefe68
Untracked json folder
ZOUHEIRBN Dec 13, 2023
dcd1ada
Fixed hanging issue when calling collector.flush()
ZOUHEIRBN Dec 13, 2023
ccb65f7
bug fix
ZOUHEIRBN Dec 13, 2023
6dec7ac
Fixed hanging issue when calling collector.flush()
ZOUHEIRBN Dec 14, 2023
0ebad0e
Implemented DatabaseClient class for Collector API (experimental)
ZOUHEIRBN Dec 15, 2023
2ec27d7
Fixed a bug in Collector API
ZOUHEIRBN Dec 15, 2023
2688fdf
Included language in word ID generation
ZOUHEIRBN Dec 15, 2023
ad2fe20
Included language in word ID generation
ZOUHEIRBN Dec 15, 2023
c9cdb8f
Implemented MySQLClient wrapper (beta)
ZOUHEIRBN Dec 22, 2023
1389bd4
Bug fix
ZOUHEIRBN Dec 25, 2023
fa30c61
Integrated DatabaseClient wrapper in FeatureExtractor
ZOUHEIRBN Dec 25, 2023
eab1314
Added LLM based embedding using bert_based_multilingual
ZOUHEIRBN Dec 26, 2023
11adb3b
Fixed syntax related bugs in graph.py
ZOUHEIRBN Dec 28, 2023
b3d66f8
Fixed definitions storing issue due to key duplication (BETA)
ZOUHEIRBN Dec 28, 2023
63dd377
Created data_reporting.py (initial)
ZOUHEIRBN Dec 29, 2023
b890d74
Imlemented inwards deorphanization in collect_info
ZOUHEIRBN Dec 29, 2023
34eafa7
Reduced the effect of deorphanization to moderate level
ZOUHEIRBN Jan 1, 2024
02b5074
bug fix upon db creation
benlahmar Jan 2, 2024
8d84af1
Added word exclusion param to get_word_info
ZOUHEIRBN Jan 2, 2024
0885dcb
Created Dev branch
ZOUHEIRBN Jan 2, 2024
8bc7ddd
First deployment
benlahmar Jan 2, 2024
1c16f0c
Merge branch 'master' of https://github.com/ZOUHEIRBN/WiktionaryParser
ZOUHEIRBN Jan 2, 2024
fa8da3f
Fixed memry overloading bug
ZOUHEIRBN Jan 3, 2024
051481f
Fixed memry overloading bug
ZOUHEIRBN Jan 3, 2024
e1ee2c3
Added get_datasets function
ZOUHEIRBN Jan 3, 2024
9026c32
Added dependency graph function to test.py
ZOUHEIRBN Jan 4, 2024
b664e70
Update main.py
ZOUHEIRBN Jan 4, 2024
aeada91
Removed deorphanize.py
ZOUHEIRBN Jan 4, 2024
e70d24d
Removed deorphanize.py
ZOUHEIRBN Jan 4, 2024
9958d07
Merge branch 'master' of https://github.com/ZOUHEIRBN/WiktionaryParser
ZOUHEIRBN Jan 4, 2024
97a139f
Merge branch 'master' of https://github.com/ZOUHEIRBN/WiktionaryParser
ZOUHEIRBN Jan 4, 2024
70c4da9
Working code but still in doubts
ZOUHEIRBN Jan 4, 2024
84ac6f1
Rewritten phhase 4 deorphanization (BETA)
ZOUHEIRBN Jan 4, 2024
b8b12f9
Rewritten phhase 4 deorphanization (BETA)
ZOUHEIRBN Jan 4, 2024
4624ddc
Merge branch 'master' of https://github.com/ZOUHEIRBN/WiktionaryParse…
benlahmar Jan 5, 2024
9d0cde0
Merged master branch
benlahmar Jan 5, 2024
2aa3afa
Added report.py
benlahmar Jan 5, 2024
61895fe
Added titles to graph nodes
ZOUHEIRBN Jan 6, 2024
c2bea26
staged changes (NOT STABLE)
ZOUHEIRBN Jan 8, 2024
8ff3bdd
Before refactoring changes to relationship colllection in core.py
ZOUHEIRBN Jan 9, 2024
84852cf
Added def_id to data collection in core.py
ZOUHEIRBN Jan 9, 2024
38165fe
Changed definition_id generation method
ZOUHEIRBN Jan 9, 2024
0ad6c7c
Fixed unrelated words issue (BETA)
ZOUHEIRBN Jan 9, 2024
20ab3c0
Fixed strange whitespaces in words upon storage
ZOUHEIRBN Jan 9, 2024
3209139
Fixed unrelated words issue (BETA)
ZOUHEIRBN Jan 9, 2024
66086e9
Fixed bug in collector.py
ZOUHEIRBN Jan 9, 2024
6a402d9
Fixed unrelated words issue (CONFIRMED)
ZOUHEIRBN Jan 9, 2024
fb1efeb
Added parameter for dialect inclusion
ZOUHEIRBN Jan 9, 2024
ce85daf
Disabled dialect inclusion in core.py (TEMPORARY)
ZOUHEIRBN Jan 10, 2024
0526794
Fully implemented dialect inclusion in core.py
ZOUHEIRBN Jan 10, 2024
1df6ca7
Fully implemented dialect inclusion in core.py
ZOUHEIRBN Jan 10, 2024
8102b2f
Added filtering to headword-line definitions
ZOUHEIRBN Jan 10, 2024
560c017
Fixed preprocessing bug
ZOUHEIRBN Jan 11, 2024
cfb9f99
Implemented word magnitude calculation (ALPHA)
ZOUHEIRBN Jan 11, 2024
23ac94a
Implemented word magnitude calculation (ALPHA)
ZOUHEIRBN Jan 11, 2024
9ff2554
Merge remote-tracking branch 'Pulled-repo/Dev'
ZOUHEIRBN Jan 11, 2024
2c2d420
Added params to export_graph_viz
ZOUHEIRBN Jan 12, 2024
1ac432c
Implemented def2def (ALPHA)
ZOUHEIRBN Jan 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -113,4 +113,9 @@ venv.bak/
dmypy.json

# VSCode
.vscode/
.vscode/
dsInfo.json

backup/
json/
.json
Empty file added __init__.py
Empty file.
4,426 changes: 4,426 additions & 0 deletions appendix.json

Large diffs are not rendered by default.

Loading