# Working with local files

Allow the model to extract specific information from a local file, simulating a real-world use case of processing structured or unstructured data.

You have a local file containing data (e.g., a JSON, CSV, or plain text log). Your task is to write a prompt that instructs the model to read and extract specific information from the file, such as identifying trends, filtering relevant data, or summarizing the contents.

In [None]:
import google.generativeai as genai


genai.configure(api_key="<your key>")
model = genai.GenerativeModel(model_name="gemini-1.5-flash")  

# Step 1: Upload the PDF file
pdf_file_path = "program-in-python.pdf"  
sample_file = genai.upload_file(path=pdf_file_path, display_name="program-in-python PDF Document")

# Step 2: Generate a summary of the uploaded PDF
prompt = "Print a summary of the document."
response = model.generate_content([sample_file, prompt])

# Step 3: Print the summary response
print("Summary of the document:")
print(response.text)


In [None]:
prompt = "According to the document, what is the main programming language used?"
response = model.generate_content([sample_file, prompt])
print(response.text)


In [None]:
prompt = "According to the document, what is the way to print 1,22,333,444?"
response = model.generate_content([sample_file, prompt])
print(response.text)

# Exercise

You have a log file (app.log) that contains various events, such as errors, warnings, and informational messages. The goal is to extract key insights like:
	1.	The total number of errors.
	2.	The most common error message.
	3.	The timestamp of the last error.

Validate your response with:

In [None]:
!pip install pandas --quiet

import pandas as pd
import re
from collections import Counter

# Read the log file content
with open("app.log", "r") as file:
    log_content = file.readlines()

# Step 1: Parse the log content into a structured format
logs = []
for line in log_content:
    match = re.match(r"(?P<timestamp>\S+ \S+) \[(?P<level>\w+)] (?P<message>.+)", line)
    if match:
        logs.append(match.groupdict())

# Step 2: Convert to DataFrame for analysis
df = pd.DataFrame(logs)

# Step 3: Filter error logs
error_logs = df[df['level'] == 'ERROR']

# Step 4: Count total errors
total_errors = len(error_logs)

# Step 5: Find the most common error message
most_common_error = Counter(error_logs['message']).most_common(1)[0][0] if not error_logs.empty else None

# Step 6: Find the timestamp of the last error
last_error_timestamp = error_logs['timestamp'].iloc[-1] if not error_logs.empty else None

# Print the results
print(f"Total Errors: {total_errors}")
print(f"Most Common Error: \"{most_common_error}\"")
print(f"Last Error Timestamp: {last_error_timestamp}")