Skip to content

Limitations

Connor Killingbeck edited this page Mar 27, 2024 · 21 revisions

Main Author: Connor Killingbeck

Main Limitations

LLM Based Analytics πŸ’­

The LLM is not perfect in it's understanding, and as such it may produce results which are incorrect or hard to understand. Since our rewards are generated from this generalized data, rewards have the potential to be formulaic/generalized.

RAM Constraints 🐏

The clustering process done to find similarities between submissions is currently very expensive. The process takes a large amount of RAM, and while not an issue at this point in time, but should the database get too large in scale, there will be speed concerns.

Token Count per Submission πŸ₯‡

Continuing off of the above, there is also a limit to how many tokens we can use within the model itself. In essence, what this means is that we are relying on the contributor to be concise and accurate with their messages. When parsing and handling data to send to an AI, the AI can only "remember" so much, and as such, very long all over the place submissions may find that the final parts of their submission are not brushed on when they get to the rewards section. Such a limit is dictated by many factors, such as model speed, hardware components, and technological advancement. At current, the number of tokens we have at our disposal should be more then enough for the average contribution.

No Cookies πŸͺ

We use session tokens (A.K.A cookies) for authentication purposes exclusively, causing data that is entered inside the chat to be lost should the page crash or be forcibly refreshed. Additionally, it's worth noting that cookies are slowly being deprecated by many of the popular chromium browsers, and another method of authentication will have to be used/looked into in the future.

Randomized Reward πŸ“Š

In terms of intentionally implemented randomness, the WordCloud is implemented in such a way that the words are randomly placed with random colors, within bounds. As such, if a users submission uses exclusively long and complex words (for example, they only copy paste the top 25 longest words in the English language), not all the words may be visible inside of the WordCloud, as they may not all fit inside the canvas.

Notable Edge Cases πŸ—»

While edge cases have been tested extensively, how we handle them could be seen as a limitation. Should a user make a chat with the AI and then instantly hit submit, or should they not put in enough information, the rewards page will simply not generate anything. This is intentional, as all rewards resources check to see if the available data is sufficient before calculation, but it could have been possible to have temporary data or a "not enough data" exception. In reality, virtually all submissions will have enough data to generate front end rewards from.

Multi-Language Support πŸ—£

At present, multi-language support does not apply fully to the clustering algorithm. Both embeddings and LLM's themselves are Language Model specific, causing the same embedding but in different languages to not be equivalent. Usually, the semantics of embeddings are completely different between LLM's, causing clustering to not be able to be done on them.

More on this issue specifically and how to maintain it can be found here: https://github.com/mustafa-tariqk/mindscape/wiki/Maintenance#multi-language-support

Security Measures

Login Functionality 🀡

While at first we did not want to implement a required login feature due to privacy concerns, we later decided that as a security measure against spam submissions, we would requires contributors to put in a valid email, that we will use to link their submissions to. It should be noted that we do not and will not use this email for any actions other then linking to a chat.

Chat Pipeline πŸ›’

The chat pipeline, as previously mentioned, constitutes logging in, chatting with the bot, and then receiving your reward. This pipeline is intentional, as should someone simply try to force their way to the rewards screen, they will not be able to. This means that the only way through the pipeline is through creating a new chat after login in, with a new and specific chat ID.

Front End Specific Limitations

Text Field Handling πŸ“±

While a great deal of effort has gone into the text fields and their overflowing methods, it should be stated that there are some alternatives that could have been implemented. Mainly, the text field pertaining to the "send" box is a single line textbox that sans from the left to the send box. While it could have been possible to make the field scale to the size of the send box, issues arose with window scaling's and the textbox being usable.

Privacy and Feedback

Previously Stated πŸ•’

As noted above in the previous sections, google authentication is only to make sure the user has a valid google account. Additionally, submission rewards are only available to users a single time per submission, this is to make sure that their entire are protected from viewing from other users.

Feedback Form πŸ“ƒ

Finally, our bot is by no means perfect. It is filled with errors, mistakes, and strange quirks, as all programming projects are. As such, should any more notable limitations or issues arise, we have added a google form linked to the Neuma mindscape google account. As such, users can submit feedback for the product to the google form, whereas owners and developers will be able to see both likes and dislikes, wants and needs of users immediately after they submit their conversation with the bot.