The Log Analysis Script is a Python tool designed to process web server log files and extract key information. It provides insights into the traffic and security events recorded in the logs by performing data aggregation and analysis efficiently. The results are output to the terminal and saved as a CSV file.
-
Count Requests Per IP Address:
- Aggregates the total number of requests made by each IP address.
- Displays results in descending order of request count.
-
Identify the Most Frequently Accessed Endpoint:
- Analyzes the log to find the endpoint (URL or resource path) that was accessed the most.
- Displays the endpoint name and the number of times it was accessed.
-
Detect Suspicious Activity:
- Identifies IP addresses with failed login attempts exceeding a configurable threshold.
- Flags potential brute force attacks and displays the failed login count.
-
Export Results to CSV:
- Saves all analyzed data to a CSV file for future reference.
Ensure you have the following installed:
- Python 3.7 or later
- Required libraries:
pandas
You can install the necessary library by running:
pip install pandas
-
Log Parsing:
- The script reads the log file line by line.
- Regular expressions are used to extract IP addresses, endpoints, and HTTP status codes.
-
Data Aggregation:
- Uses Python's
Counter
andpandas
to count occurrences of IP addresses and endpoints. - Detects suspicious activity based on failed login attempts (
401
or "Invalid credentials").
- Uses Python's
-
Output:
- Displays results in the terminal in a structured format.
- Saves results to a CSV file (
log_analysis_results.csv
).
git clone https://github.com/hetissh/log-analysis-python-script.git
cd log-analysis-python-script
- Place your web server log file (e.g.,
sample.log
) in the script's directory. - Update the script if your log file has a different name or format.
Run the script by executing the following command:
python log_analysis.py
- Terminal Output: Key findings (e.g., requests per IP, most accessed endpoint, suspicious activity) will be displayed.
- CSV Output: Results are saved to
log_analysis_results.csv
in the same directory as the script.
IP Address Request Count
192.168.1.1 234
203.0.113.5 187
Most Frequently Accessed Endpoint:
/home (Accessed 403 times)
Suspicious Activity Detected:
IP Address Failed Login Attempts
192.168.1.100 56
203.0.113.34 12
Results saved in log_analysis_results.csv
:
Requests per IP
IP Address,Request Count
192.168.1.1,234
203.0.113.5,187
Most Accessed Endpoint
Endpoint,Access Count
/home,403
Suspicious Activity
IP Address,Failed Login Count
192.168.1.100,56
203.0.113.34,12
- Failed Login Threshold:
Modify the threshold for detecting suspicious activity in the script:
FAILED_LOGIN_THRESHOLD = 10 # Default value
Feel free to submit issues or pull requests if you:
- Encounter any bugs
- Want to enhance the script’s functionality
Developed by Hetissh (https://github.com/hetissh)