Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

Implemented HTML Parsing with Regex Filters for FBI API JSON Extraction #3

Merged
merged 1 commit into from
May 13, 2023

Conversation

PakCyberbot
Copy link
Contributor

@PakCyberbot PakCyberbot commented May 13, 2023

Implemented regex filters to improve HTML parsing for the FBI API, handling unexpected HTML responses and extracting JSON data. These filters enhance data extraction reliability and versatility, ensuring our application interacts smoothly with the API. They seamlessly integrate with existing data processing workflows, providing a robust solution for parsing HTML and extracting JSON data.

I have left some regex as comments in case the API returns HTML data again, and I will update the filters accordingly if needed.

You can check the results below:
Now after applying HTML parsing, fbi-api can be able to fetch data from HTML page.
HTML data
https://api.fbi.gov/@wanted-person/b166d627e11149aa82c02d1533e3b650
2023-05-13 15_24_17-test py - fbi-wanted - Visual Studio Code
2023-05-13 15_25_17-Mozilla Firefox

This library supports both HTML and JSON
JSON data
https://api.fbi.gov/@wanted-person/b4db60d7ed14449cb73ba39564ef9e19
2023-05-13 15_25_43-test py - fbi-wanted - Visual Studio Code
2023-05-13 15_14_07-Mozilla Firefox

Introduces a set of regex filters to improve our HTML parsing capabilities when dealing with the FBI API. In certain cases, the API may unexpectedly return HTML content instead of the expected JSON format. With these regex filters in place, we can effectively extract the JSON data from the HTML response.

By implementing these filters, we enhance the reliability and versatility of our data extraction process, ensuring that we can handle a wider range of API responses. This enhancement strengthens the overall stability and usability of our application when interacting with the FBI API.

The new regex filters provide a robust solution for parsing HTML and extracting the required JSON data, allowing for seamless integration with our existing data processing workflows.
@rly0nheart rly0nheart merged commit e770b8b into rly0nheart:master May 13, 2023
@PakCyberbot
Copy link
Contributor Author

Thank you, SIR, for the opportunity to learn!

I did not understand why you dissolved your FBI-Mostwanted tool into an API. You could have created it separately and kept the tool, as it is even used by the CSI Linux distro. I will try to create the tool again and will give you credit for the tool.

@rly0nheart
Copy link
Owner

Thank you, SIR, for the opportunity to learn!

I did not understand why you dissolved your FBI-Mostwanted tool into an API. You could have created it separately and kept the tool, as it is even used by the CSI Linux distro. I will try to create the tool again and will give you credit for the tool.

Sure thing! Do whatever you want with it mate :)

I decided to make it a library so that people can create better tools with it, instead of just using it as a single tool. In this case, you might be the one to create that better tool.
Making it a library, is more of a window for other developers to create something of their own with it.

Good luck, let me know if you need any help.

@PakCyberbot
Copy link
Contributor Author

You have already been a great help!
If I need any assistance in the future, I will definitely reach out to you!
Thanks

@rly0nheart
Copy link
Owner

You have already been a great help!
If I need any assistance in the future, I will definitely reach out to you!
Thanks

Alright mate! :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants