# **1. Business Understanding**

## **Problem**

With the arrival of LLMs (Large Language Models), one of the most common situations in today's world is people using AI systems, like Chat-GPT, to create text content for them. This content can be used for doing homework, writing an article, creating a portfolio description, generating names, creating histories, simulate conversations, and thousands of other things.

The texts generated by these mechanisms are pretty well written and can convince anyone that the text wasn't written by a machine. That being said, the **objective of this project is to develop a solution that can be able to distinguish whether a text was written by a human or machine.**

## **Final Users**

The final user can be any one who is in doubt about some text being written by a machine, for any reason. The most clear example is education professionals evaluating student's responses; in this case, this tool can be very usefull (if it doesn't have too many errors).

## **Implications**

Of course, sometimes, the tool will not give a precise result, which can cause some discomfort (especially in the example cited before). A possible solution (it is not perfect) is, instead of saying to the final user `"Text Generated by Machine"` or `"Text Generated by Human"`, just say that the given text has a higher confidence of being generated by a machine or not.

As I said before, that's not a perfect solution, so the final user must be notified that this solution not perfect and any errors can occur, and the product is not recommended for critical situations.

## **Solution**

The solution will be a classification model, to predict whether a text was written by a human or a machine and the final product will be a web application that will run at streamlit cloud, where anyone can access this tool.

## **Methodology**

This entire project follows the CRISP-DM methodology. For a better understanding, I'll be dividing this section into three notebooks:

1. **Business Understanding** (this notebook)

  * In this, the problem is being discussed from a general perspective and the project solution organization is being described.

2. **Exploratory Data Analysis**

  * At this stage, the available data to train the model will be analyzed for finding any possible insights that can be used to improve the final ML model's accuracy.
  
3. **Data Preparation**

  * Before the modeling stage, the data must be transformed to a valid format, that can be used by the models.

4. **Modeling**

 * Finally, in this section, the best model possible will be trained to solve our problem.

## **Deployment**

After the best model tuning is done, I'll be developing the web application with Streamlit and creating the entire pipeline for predicting whether a text was written by a human or a machine.