GPThreatIntel-Summarizer is a Python-based repository that leverages the power of OpenAI's GPT (Generative Pre-trained Transformer) models to provide an automated summarization solution for Cyber Threat Intelligence (CTI) reports. This tool simplifies the process of extracting key insights from CTI reports, enabling cyber threat analysts to generate concise and informative summaries for upper management.
- Utilizes OpenAI GPT models for natural language processing and summarization tasks.
- Extracts relevant text from CTI reports using BeautifulSoup & pdfplumber.
- Generates summarized reports based on user-defined length or word count.
- Extracts Indicators of Compromise (IOCs) and Tactics, Techniques, and Procedures (TTPs) from reports.
- Provides an intuitive web interface powered by FastAPI for easy interaction and display of results.
To get started with GPThreatIntel-Summarizer, follow these steps:
-
Clone the repository:
git clone https://github.com/yourusername/GPThreatIntel-Summarizer.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the application:
python app.py
-
Access the web interface in your browser at http://localhost:5001.
- Access the web interface in your browser at https://gp-threat-intel-summarizer.vercel.app/
- Enter your OpenAI API Key, which can be found here
- Enter the URL or paste the text content of the CTI report in the provided text field.
- Alternatively, you can upload a PDF file.
- Choose your GPT Model.
- Specify the desired length or word count for the summary.
- Click the "Summarize" button to generate a summary of the report.
- The extracted IOCs and TTPs will be displayed below the summarized report.
- Parse IOC's from an image
- Use LangChain to help with the text-embedding & vectors
- Wanted to try out OpenAI API & FastAPI
- OpenAI
- FastAPI
- TailwindCSS
- OpenAI Model has its limitations, such as the number of tokens (words) it can process. The base model is GPT 3.5 Turbo and it has a token limit (words) of 4097 tokens.
- Therefore, if the text content that users want to send to the model is larger than 4097 tokens (words), the model would not be able to process it.
- Implement text embedding (on my developer end, which is me)
- Use a different OpenAI model, e.g. GPT3.5 Turbo (16k), GPT4
- More information can be found on OpenAI's documentation here
Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request.
This project is licensed under the MIT License.
GPThreatIntel-Summarizer empowers cybersecurity professionals to efficiently analyze and communicate critical CTI findings, enhancing decision-making processes and improving organizational security.