Preserve Indigenous Languages
by expanding Cortext to 20 other languages influencing "schema.org" to implement this Semantic Calculus across the web
Powerful LLMs are held by IBM, Google, Twitter, ChatGPT ...
These companies provide consultations on how to engineer language to best accomplish sophist goals. Cortext is a lightweight LLM that extracts "hashtags" and searches the most overlapping "Blue Words" in Wikipedia.
The Cortext LLM is more elegant than "Word2Vec" style Natural Language Processing, though crudely implemented... it is based on Wierzbicka's Semantic Primes, and has the potential to function in 20+ languages.
Thank you Mat Allen for all the contribution!
## create image
sudo docker build -t cortext-python310 .
sudo docker run -it -p 5000:5000 --name cortext-python310-fastmcp cortext-python310
## inside the image run
python3 ./cortext_io/api/app.py
./text.sh [Title] [Email] [Email?]
["/bin/bash","-c","echo '[This is clean text that will not have encoding issues in a linux command line, and is also ~7800 characters]' > /home/ec2-user/cortext_io/cortext_io_input/input.txt && rm /home/ec2-user/cortext_io/cortext_io_input/input2.txt && /home/ec2-user/cortext_io/10K_RiskFactors_PROCESS/1_CORTEXT_Run.sh [Title] [Email] ['Yes'/'No']"]
You will need to replace the emails, usernames, and passwords in the following locations for this to actually send out content:
Purpose | Location |
---|---|
Send Email Report with SMTP | cortext_io/10K_RiskFactors_PROCESS/sendEmail.py |
Push content to your ElasticSearch instance | cortext_io/10K_RiskFactors_PROCESS/zzz_pushElastic.py |
I recommend you open the SQLite Database File in a tool like DBeaver to understand the data model or see the uml diagram here: https://www.cortext.io/how-it-works
Purpose | Location |
---|---|
Relational Data Structure for Text Decomposition | cortext_io/cortext_io_db/cortext_io.db |
The LLM has gone through multiple languages and architectures (not to say they got progressively better). I landed on Talend... Search for the routines to see where the hashtag extraction takes place.
Purpose | Location |
---|---|
"Scripts" in Talend Workflow | cortext_io/AA_cortext_io_linux_0.1/AA_cortext_io_linux/items/integralmass/code/routines |
A Python Perceptron is used to clean the noise, based on hand-selection of the best hashtag extractions from 1000 actuarial articles.
It is my current understanding that email bodies cannot have executable javascript, so the Associative Web (force-directed node graph) is just hanging there.
https://perrydime.com/Begin_With_The_End_In_Mind.pdf
https://nlpbigdata.jeffersonrichards.com/
https://jefferson.cloud https://richards.plus https://richards.systems