This Python script 🐍 is designed to fetch data from a specified URL 🌐 in various formats (JSON, CSV, YAML, XML) and store it in a structured directory based on timestamp 📅. The script utilizes the requests
📡, csv
📊, json
📋, yaml
🧾, xml.etree
🌳, datetime
🕰️, and schedule
🕒 modules for data retrieval, processing, and scheduling.
- Python 3.x
- Dependencies (install via
pip install -r requirements.txt
)
-
Clone the repository:
git clone https://github.com/rikinptl/user-data-classify.git cd user-data-classify
-
Run the script:
python script.py
-
The script prompts you to enter the desired file format (csv, json, yaml, xml) that you want to fetch.
-
It fetches data from a specified URL (https://randomuser.me/api/) in the chosen format.
-
The data is stored in a structured directory based on timestamp.
-
The script is scheduled to run every 3 hours, allowing for periodic data retrieval and storage.
-
__init__(self, url: str, file_type: str)
: Initializes the classifier with a URL and file type. -
pathmaker(self) -> str
: Generates a structured directory and file name based on timestamp. -
fetchdata(self) -> requests.Response
: Fetches data from the specified URL in the chosen format. -
storedata(self, data: requests.Response, file_path: str) -> None
: Stores the fetched data in the specified file format (JSON, CSV, YAML, XML). -
job(fileformat) -> None
(Static Method): Allows the user to input the desired file format, creates a classifier object, fetches and stores data.
The script is configured to run periodically using the schedule
module, ensuring automated and consistent data retrieval and storage.
-
The
job
function is scheduled to run every 3 seconds (adjustable) using theschedule.every(3).seconds
syntax. -
The script continuously checks for pending scheduled jobs with
schedule.run_pending()
. -
The script remains active, allowing scheduled jobs to execute at their designated intervals.
Feel free to modify the schedule parameters based on your specific requirements. If you have any further questions or adjustments, please let me know!
-
Exception Handling:
- Implemented handling for module not found and general exceptions.
- Consider specifying specific exceptions for better error diagnostics.
-
Dynamic Code Execution:
- Creative use of a dictionary and dynamic code execution (
exec
) for different file formats.
- Creative use of a dictionary and dynamic code execution (
-
File Handling:
- Effective handling of file creation and directory structuring based on timestamps.
-
User Input Validation:
- Good practice to validate user input for the desired file format.
-
Scheduling:
- Use of the
schedule
module for periodic data retrieval and storage is valuable for automation.
- Use of the
-
Repetition of Code:
- Repetition in fetching, storing, and scheduling. Consider refactoring to avoid redundancy.
-
User Interaction:
- Relying on continuous user input. Consider adding more user-friendly interactions and options.
-
Logging:
- Incorporate logging for detailed information about each execution for better debugging.
-
Code Structure:
- Break down the script into smaller functions or classes for better maintainability.
-
Exit Mechanism:
- Graceful exit mechanism instead of using
exit()
on exception handling.
- Graceful exit mechanism instead of using
-
User Configurability:
- Explore options to allow users to configure aspects like the URL or scheduling intervals.
-
Testing:
- Implement unit testing to ensure the correctness of individual components and functions.
-
Parallel Processing:
- Explore parallel data fetching for potential performance improvements.
-
Documentation:
- Add comments within the code to explain complex logic or functionality.
-
Code Style:
- Adhere to consistent code style conventions for improved readability.
🍏 Feel free to customize and enhance the script based on these insights! 🍎