<a href="https://colab.research.google.com/github/mxb02014/CoLab/blob/main/Untitled0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 1. Google Drive のマウント

Google Drive 上の PDF ファイルにアクセスするために、Google Drive をマウントします。実行すると認証を求められるので、手順に従ってください。

In [None]:
from google.colab import drive
import os
import shutil

# Unmount Google Drive if already mounted
try:
  drive.flush_and_unmount()
  print("Google Drive unmounted.")
except ValueError:
  print("Google Drive was not mounted.")

# Clean mountpoint directory if it exists and is not empty
mountpoint = '/content/drive'
print(f"Cleaning mountpoint directory: {mountpoint}")
if os.path.isdir(mountpoint):
    try:
        # Attempt to remove the directory and its contents
        shutil.rmtree(mountpoint)
        print(f"Mountpoint directory '{mountpoint}' removed.")
    except OSError as e:
        print(f"Error removing mountpoint directory '{mountpoint}': {e}")
        # If removal fails, try to empty it
        try:
            for item in os.listdir(mountpoint):
                item_path = os.path.join(mountpoint, item)
                if os.path.isdir(item_path):
                    shutil.rmtree(item_path)
                else:
                    os.remove(item_path)
            print(f"Mountpoint directory '{mountpoint}' emptied.")
        except OSError as e_empty:
             print(f"Error emptying mountpoint directory '{mountpoint}': {e_empty}")
elif os.path.exists(mountpoint):
    # If it exists but is not a directory, remove it
    try:
        os.remove(mountpoint)
        print(f"Mountpoint '{mountpoint}' (not a directory) removed.")
    except OSError as e:
        print(f"Error removing mountpoint '{mountpoint}' (not a directory): {e}")

# Recreate the mountpoint directory if it doesn't exist or was removed
if not os.path.exists(mountpoint):
    try:
        os.makedirs(mountpoint)
        print(f"Mountpoint directory '{mountpoint}' recreated.")
    except OSError as e:
        print(f"Error recreating mountpoint directory '{mountpoint}': {e}")
        print("Cannot proceed with mounting Google Drive.")
        # exit() # Do not exit in a notebook cell, just print error


# Mount Google Drive
print("Mounting Google Drive...")
# Use force_remount=True to ensure a fresh mount if needed
drive.mount(mountpoint, force_remount=True)
print("Google Drive mounted.")

Google Drive unmounted.
Cleaning mountpoint directory: /content/drive
Mountpoint directory '/content/drive' recreated.
Mounting Google Drive...
Mounted at /content/drive
Google Drive mounted.


### 2. PDFファイルからのテキスト抽出

指定されたディレクトリ内の PDF ファイルからテキストを抽出します。`pdf_directory` に PDF ファイルが保存されている Google Drive 上のパスを設定してください。

In [None]:
# Import necessary libraries
from pdfminer.high_level import extract_text_to_fp
import io
import os
import re

# Define the directory containing the PDF files
# --- ユーザーはここに PDF ファイルが保存されている正確な Google Drive のパスを設定してください ---
pdf_directory = '/content/drive/MyDrive/Colab Notebooks/data/downloads/'

# List to store extracted text data
extracted_data = []

# Check if the directory exists
if not os.path.exists(pdf_directory):
    print(f"Error: Directory not found at {pdf_directory}")
else:
    # Iterate through files in the directory
    print(f"Scanning directory: {pdf_directory}")
    for filename in os.listdir(pdf_directory):
        if filename.endswith('.pdf'):
            filepath = os.path.join(pdf_directory, filename)
            print(f"Processing file: {filename}")

            try:
                # Use io.StringIO to capture the output
                output_string = io.StringIO()

                # Extract text from the PDF to the string buffer
                with open(filepath, 'rb') as infile:
                    extract_text_to_fp(infile, output_string)

                # Get the full text
                full_text = output_string.getvalue()

                # Split the text into pages (simple split by form feed character)
                pages = full_text.split('\x0c')

                page2_text = ""
                page3_relevant_text = ""

                if len(pages) > 1:
                    page2_text = pages[1] # Page 2 text (index 1)

                if len(pages) > 2:
                    page3_text = pages[2] # Page 3 text (index 2)

                    # Find the relevant section starting from "朝礼拝" or "主日礼拝" or "集計"
                    # and ending at "◎" or the end of the page
                    relevant_section_match = re.search(r'(朝礼拝|主日礼拝|集計).*?(◎|$)', page3_text, re.DOTALL)

                    if relevant_section_match:
                        page3_relevant_text = relevant_section_match.group(0) # Extract the full matched text


                # Append the extracted data to the list
                extracted_data.append({
                    'filename': filename,
                    'page2_text': page2_text,
                    'page3_relevant_text': page3_relevant_text
                })

                print(f"  Extracted text from page 2 and relevant section from page 3.")

            except Exception as e:
                print(f"  Error processing file {filename}: {e}")

print(f"\nFinished extracting text from {len(extracted_data)} files.")
# The extracted_data list is now populated.

Scanning directory: /content/drive/MyDrive/Colab Notebooks/data/downloads/
Processing file: 240114通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240121通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240128聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240211通常週報♡２.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240218通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240225通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240204聖餐式週報♡２.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241124通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241117通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241110通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241103聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241027聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241020通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241013通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241006聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240929通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240922通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240915通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240908通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240901聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240825通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240818通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240811通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240804聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240728通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240721通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240714通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240707聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240630通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240623通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240616通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240609通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240602聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240519聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240512通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240428通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240421通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240414通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240407聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240331聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240324通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240317通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240310通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240303聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240526通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241208通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241215通常週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 240505聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.
Processing file: 241201聖餐式週報♡.pdf




  Extracted text from page 2 and relevant section from page 3.

Finished extracting text from 49 files.


### 3. 抽出データの加工と氏名抽出

抽出されたテキストデータから、日付や朝礼拝・主日礼拝の氏名を抽出します。

In [None]:
import re
from datetime import datetime, timedelta
import pandas as pd
import os
from collections import defaultdict

# Assuming extracted_data list is available from the previous step

processed_attendance_data = []

def extract_date_from_page2(text, filename):
    """
    Extract a date from the text of page 2 and infer year from filename.
    Assumes date is in "YYYY 年 MM 月 DD 日", "YYYY年MM月DD日", or "MM 月 DD 日" format.
    Infers year from filename if not present in text.
    Returns a datetime.date object or None if not found/invalid.
    """
    year = None
    # Attempt to extract year from filename (e.g., "24" from "240114...")
    filename_year_match = re.match(r'(\d{2})\d{4}', filename)
    if filename_year_match:
        # Assume year is in 20xx format
        year = 2000 + int(filename_year_match.group(1))


    # Try to extract date in "YYYY 年 MM 月 DD 日" or "YYYY年MM月DD日" format first
    date_match_yyyy = re.search(r'(\d{4})\s*年\s*(\d{1,2})\s*月\s*(\d{1,2})\s*日', text)

    if date_match_yyyy:
        year = int(date_match_yyyy.group(1))
        month = int(date_match_yyyy.group(2))
        day = int(date_match_yyyy.group(3))
    else:
        # If YYYY format not found, try to extract "MM 月 DD 日" format
        date_match_mmdd = re.search(r'(\d{1,2})\s*月\s*(\d{1,2})\s*日', text)
        if date_match_mmdd and year: # Only proceed if MM/DD format is found AND year was inferred from filename
            month = int(date_match_mmdd.group(1))
            day = int(date_match_mmdd.group(2))
        else:
            return None # Return None if no date pattern found or year is missing for MM/DD format


    # If we have year, month, and day, try to construct a date object
    if year is not None and month is not None and day is not None:
        try:
            extracted_date = datetime(year, month, day).date()
            return extracted_date
        except ValueError:
            return None # Return None for invalid date values (e.g., 2月30日)

    return None # Return None if we couldn't get all date components


def extract_names_from_section(text):
    """
    Extract individual names from the 朝礼拝 or 主日礼拝 text section.
    Assumes names are separated by '、', ',', '，', '､', or spaces/newlines.
    Removes spaces from the text before splitting.
    Includes basic filtering for non-name patterns like "大人", "子供", "名", "計", "氏名順不同",
    and also patterns like "大人 名", "子供 名", "大名", "子供名", "計名".
    Corrected regex for wider range of kanji and includes half-width katakana.
    Removes '初' prefix and "(聖餐式)" if present.
    Handles various delimiters more robustly.
    """
    if not text:
        return []

    # Remove "(聖餐式)" and "(氏名順不同)" and leading/trailing spaces from the whole text first
    cleaned_text = re.sub(r'[\(（]?聖餐式[\)）]?', '', text) # Remove "(聖餐式)" variations
    cleaned_text = re.sub(r'[\(（]?氏名順不同[\)）]?', '', cleaned_text) # Remove "(氏名順不同)" variations
    cleaned_text = cleaned_text.strip()


    normalized = []
    # Split the cleaned input text by '、', ',', '，', '､', spaces, or newlines
    # Using a more flexible split pattern that handles multiple spaces and newlines as delimiters
    # Including both full-width and half-width commas and dots as potential delimiters
    # Also handling the half-width middle dot '･' as a potential name character, not a delimiter.
    potential_names = [item.strip() for item in re.split(r'[、,，､\s\n]+', cleaned_text) if item.strip()]


    # Updated regex to include a wider range of Unicode CJK ideographs, Hiragana, Katakana (full and half-width),
    # and the prolonged sound mark, and the half-width middle dot '･'.
    # This regex keeps these characters.
    # Added \u3000-\u303f for CJK symbols and punctuation (including full-width middle dot)
    # Added \uff00-\uffef for half-width and full-width forms (including half-width katakana and middle dot)
    keep_chars_pattern = re.compile(r'[^\u3400-\u4DBF\u4E00-\u9FFF\uF900-\uFAFF\u3040-\u309F\u30A0-\u30FF\u30FC-\u30FD\u3000-\u303f\uff00-\uffef]+')


    # Define patterns to explicitly filter out (including combinations with '名')
    filter_patterns = [
        "大人", "子供", "名", "計", "氏名順不同",
        "大人名", "子供名", "計名", # Filter "大人名", "子供名", "計名"
        r"大人\s*名", r"子供\s*名", r"計\s*名", # Filter "大人 名", "子供 名", "計 名" with optional space
        r"^大\d+名", r"^子\d+名", r"^計\d+名" # Filter patterns like "大6名", "子1名", "計7名" at the start of a potential name
    ]
    # Compile filter patterns for efficiency
    compiled_filter_patterns = [re.compile(f"{p}") for p in filter_patterns] # Use search() instead of match() for partial matches


    for item in potential_names:
        if item.strip(): # Process non-empty items (after stripping)
            # Use the updated regex to remove unwanted characters
            cleaned_name = keep_chars_pattern.sub('', item).strip()

            # Remove '初' prefix if the cleaned name starts with it
            if cleaned_name.startswith('初'):
                cleaned_name = cleaned_name[1:].strip()

            # Check if the cleaned name matches any of the filter patterns
            is_filtered = False
            for pattern in compiled_filter_patterns:
                if pattern.search(cleaned_name): # Use search() instead of match() for partial matches
                    is_filtered = True
                    break

            # Filter out entries that are filtered or become empty after cleaning,
            # or are single characters (re-evaluating single char filter - keeping for now)
            if cleaned_name and not is_filtered and len(cleaned_name) > 1: # Keeping len > 1 filter
                 normalized.append(cleaned_name)
            # Re-evaluating the single character filter: Keeping it for now based on previous logic.


    return normalized


print(f"Processing extracted text for {len(extracted_data)} files...")

# Clear previous results before reprocessing
processed_attendance_data = []


for data_entry in extracted_data:
    filename = data_entry['filename']
    page2_text = data_entry['page2_text']
    page3_relevant_text = data_entry['page3_relevant_text']

    # print(f"\n--- Processing file: {filename} ---") # Optional debug

    # 1. Extract date from page 2 text, using filename for year if needed
    extracted_date = extract_date_from_page2(page2_text, filename)
    calculated_data_date = None
    if extracted_date:
        # Calculate the data date (7 days prior)
        calculated_data_date = extracted_date - timedelta(days=7)
        # date_key = calculated_data_date.strftime('%Y-%m-%d') # Optional debug
        # print(f"Extracted date from page 2: {extracted_date.strftime('%Y年%m月%d日')}") # Optional debug
        # print(f"Calculated data date (7 days prior): {date_key}") # Optional debug
    # else:
        # date_key = filename # Use filename as date key if date extraction fails # Optional debug
        # print("Could not extract a valid date from page 2 or infer year from filename.") # Optional debug
        # print(f"Using filename as date key: {date_key}") # Optional debug


    # 2. Extract 朝礼拝 and 主日礼拝 names using string search and slicing
    chourei_names = []
    shujitsu_names = []

    if page3_relevant_text:
        chourei_keyword = "朝礼拝"
        shujitsu_keyword = "主日礼拝"
        # Flexible end pattern to account for variations like "大人", "大6名", "大人 7 名" etc.
        # Search for "大人" or "大" followed by digits and optional "名", or "子#名?", "計#名?"
        end_pattern_regex = re.compile(r"大人|大\d+名?|子\d+名?|計\d+名?")


        # --- Extract 朝礼拝 section using string search and slicing ---
        chourei_start_index = page3_relevant_text.find(chourei_keyword)
        chourei_section_text = ""
        if chourei_start_index != -1:
            text_after_chourei_keyword = page3_relevant_text[chourei_start_index + len(chourei_keyword):]
            end_match_chourei = end_pattern_regex.search(text_after_chourei_keyword)
            if end_match_chourei:
                chourei_section_text = text_after_chourei_keyword[:end_match_chourei.start()].strip()
                # print(f"\n朝礼拝 Section Text Extracted (String Search):") # Optional debug
                # print(chourei_section_text) # Optional debug
                chourei_names = extract_names_from_section(chourei_section_text)
                # print("朝礼拝 names (extracted and normalized):", chourei_names) # Optional debug
            # else: # Optional debug
                # print(f"\nCould not find end pattern after '{chourei_keyword}' for 朝礼拝 section.") # Optional debug

        # --- Extract 主日礼拝 section using string search and slicing ---
        shujitsu_start_index = page3_relevant_text.find(shujitsu_keyword)
        shujitsu_section_text = ""
        if shujitsu_start_index != -1:
            text_after_shujitsu_keyword = page3_relevant_text[shujitsu_start_index + len(shujitsu_keyword):]
            end_match_shujitsu = end_pattern_regex.search(text_after_shujitsu_keyword)
            if end_match_shujitsu:
                shujitsu_section_text = text_after_shujitsu_keyword[:end_match_shujitsu.start()].strip()
                # print(f"\n主日礼拝 Section Text Extracted (String Search):") # Optional debug
                # print(shujitsu_section_text) # Optional debug
                shujitsu_names = extract_names_from_section(shujitsu_section_text)
                # print("主日礼拝 names (extracted and normalized):", shujitsu_names) # Optional debug
            # else: # Optional debug
                 # print(f"\nCould not find end pattern after '{shujitsu_keyword}' for 主日礼拝 section.") # Optional debug


    # Store the processed data for the file, including if date was not extracted
    processed_attendance_data.append({
        'filename': filename,
        'date': calculated_data_date, # Use the calculated date (7 days prior) or None
        'chourei_names': chourei_names,
        'shujitsu_names': shujitsu_names,
        'date_mismatch': False # Date mismatch check is not critical for basic processing
    })


print("\nFinished processing extracted text and dates for all files.")
# print(f"Sample processed_attendance_data for the first file: {processed_attendance_data[0]}") # Optional: print a sample

# The processed_attendance_data list now contains structured data for each file,
# including the calculated date, extracted names, and a flag for date mismatch.
# This data is ready for counting and creating the Excel file.

Processing extracted text for 49 files...

Finished processing extracted text and dates for all files.


## 集計結果の統合、整理、Excel保存

### Subtask:
抽出したデータを基に集計を行い、Excelファイルに整理して保存します。

In [None]:
import pandas as pd
import os
from collections import defaultdict
import re # reモジュールをインポート
import time # timeモジュールをインポート

# Assuming processed_attendance_data list is available from the previous step

# Data structure to hold attendance for each person by date and service
# { '氏名': { '日付': {'朝': 1 or 0, '主日': 1 or 0}, ... }, ... }
# --- 中黒対応修正前のロジックに戻す ---
attendance_by_date_and_name = defaultdict(lambda: defaultdict(lambda: {'朝': 0, '主日': 0}))

# Data structure to hold names from files with date mismatch - This is no longer used for separate sheet logic
# mismatched_files_data = {}


print(f"Aggregating attendance data for {len(processed_attendance_data)} files...")

# Function to normalize name by removing middle dots (full-width and half-width)
# --- 中黒対応修正前のロジックに戻す (この関数は使用しないが定義は残す) ---
def normalize_name_by_removing_middle_dot(name):
    """Removes full-width '・' and half-width '･' middle dots from a name."""
    if not name:
        return name
    # Use re.sub to replace both full-width and half-width middle dots with an empty string
    return re.sub(r'[・･]', '', name).strip()


# Collect all unique names for each service separately
# --- 中黒対応修正前のロジックに戻す ---
all_chourei_names = set()
all_shujitsu_names = set()

for entry in processed_attendance_data:
    filename = entry['filename']
    data_date = entry['date']
    chourei_names = entry['chourei_names']
    shujitsu_names = entry['shujitsu_names']
    # date_mismatch = entry['date_mismatch'] # This flag is no longer used for separating data


    # Use filename as a fallback date string if calculated_data_date is None
    date_key = data_date.strftime('%Y-%m-%d') if data_date else filename

    # print(f"\n--- Processing file for aggregation: {filename} (Date Key: {date_key}) ---") # Optional detailed debug
    # print("朝礼拝 extracted names:", chourei_names) # Optional detailed debug
    # print("主日礼拝 extracted names:", shujitsu_names) # Optional detailed debug

    # Add names to the respective sets of all unique names
    # --- 中黒対応修正前のロジックに戻す ---
    all_chourei_names.update(chourei_names)
    all_shujitsu_names.update(shujitsu_names)


    # Aggregate attendance for all files where date was extracted
    if data_date: # Only process if a valid date was extracted
        # --- 中黒対応修正前のロジックに戻す ---
        for name in chourei_names:
            attendance_by_date_and_name[name][date_key]['朝'] = 1
            # print(f"  Added '{name}' to 朝礼拝 for date {date_key}") # Optional detailed debug
        for name in shujitsu_names:
            attendance_by_date_and_name[name][date_key]['主日'] = 1
            # print(f"  Added '{name}' to 主日礼拝 for date {date_key}") # Optional detailed debug
        # print(f"Aggregated attendance for {filename} (Date: {date_key}).") # Optional detailed debug
    else:
         print(f"Could not process attendance for {filename} due to missing date.")


print("\nFinished processing extracted text and dates for all files.")

# --- Debugging: Print the contents of attendance_by_date_and_name_normalized ---
# --- 中黒対応修正前のロジックに戻す ---
# print("\n--- Debugging: Contents of attendance_by_date_and_name after processing files ---")
# for name, date_data in attendance_by_date_and_name.items():
#     print(f"Name: {name}")
#     for date_key, services_data in date_data.items():
#         print(f"  {date_key}: 朝={services_data['朝']}, 主日={services_data['主日']}")
# print("----------------------------------------------------------------------------")


# --- Debugging: Print the name_mapping_normalized_to_original ---
# --- 中黒対応修正前のロジックに戻す ---
# print("\n--- Debugging: name_mapping_normalized_to_original ---")
# print(name_mapping_normalized_to_original)
# print("-----------------------------------------------------")


# Create lists of unique names and dates
# --- 中黒対応修正前のロジックに戻す ---
unique_chourei_names = sorted(list(all_chourei_names))
unique_shujitsu_names = sorted(list(all_shujitsu_names))

# We still need a list of all unique dates across all files where date was extracted
all_dates_or_filenames = sorted(list(set(date for name_data in attendance_by_date_and_name.values() for date in name_data.keys())))


# --- Debugging: Print the lists of unique names and dates ---
# --- 中黒対応修正前のロジックに戻す ---
print("\n--- Debugging: Final List of unique 朝礼拝 names (unique_chourei_names) ---")
print(unique_chourei_names)
print("\n--- Debugging: Final List of unique 主日礼拝 names (unique_shujitsu_names) ---")
print(unique_shujitsu_names)
print("-----------------------------------------------------")
# print("\n--- Debugging: List of all dates/filenames ---")
# print(all_dates_or_filenames)
# print("-------------------------------------------------------")


# Create separate DataFrames for each service using their respective unique name lists as index
# Ensure columns are created for all dates/filenames
# --- 中黒対応修正前のロジックに戻す ---
chourei_excel_df = pd.DataFrame(index=unique_chourei_names, columns=all_dates_or_filenames)
shujitsu_excel_df = pd.DataFrame(index=unique_shujitsu_names, columns=all_dates_or_filenames)

# Fill the DataFrames with 0 initially
chourei_excel_df = chourei_excel_df.fillna(0)
shujitsu_excel_df = shujitsu_excel_df.fillna(0)


# Fill the DataFrames with attendance data (1 if present, NaN otherwise initially) using original names
# --- 中黒対応修正前のロジックに戻す ---
print("\n--- Debugging: Populating DataFrames ---")
for name, dates_data in attendance_by_date_and_name.items():
    # Check if the name exists in the index of the respective DataFrame before trying to set values
    if name in chourei_excel_df.index:
        for date_or_filename, services_data in dates_data.items():
            if date_or_filename in chourei_excel_df.columns:
                chourei_excel_df.loc[name, date_or_filename] = services_data['朝']
            # else:
                 # print(f"    Warning: Date/filename {date_or_filename} not found in chourei_excel_df columns.") # Optional warning

    if name in shujitsu_excel_df.index:
        for date_or_filename, services_data in dates_data.items():
             if date_or_filename in shujitsu_excel_df.columns:
                shujitsu_excel_df.loc[name, date_or_filename] = services_data['主日']
            #  else:
                #  print(f"    Warning: Date/filename {date_or_filename} not found in shujitsu_excel_df columns.") # Optional warning

# print("--- Debugging: Populating DataFrames Finished ---") # Optional debug


# Calculate total occurrences for the final column using original names
# --- 中黒対応修正前のロジックに戻す ---
chourei_excel_df['合計出現回数'] = chourei_excel_df.sum(axis=1)
shujitsu_excel_df['合計出現回数'] = shujitsu_excel_df.sum(axis=1)


# Reset index to make '氏名' a column
# --- 中黒対応修正前のロジックに戻す ---
chourei_excel_df = chourei_excel_df.reset_index().rename(columns={'index': '氏名'})
shujitsu_excel_df = shujitsu_excel_df.reset_index().rename(columns={'index': '氏名'})


# --- Debugging: Display the final DataFrames before saving to Excel ---
# --- 中黒対応修正前のロジックに戻す ---
print("\n--- Debugging: Final 朝礼拝 DataFrame before saving ---")
display(chourei_excel_df.head())
print("\n--- Debugging: Final 主日礼拝 DataFrame before saving ---")
display(shujitsu_excel_df.head())
print("-------------------------------------------------------")


print("\n--- 朝礼拝 集計結果 ---")
display(chourei_excel_df.head())
print("\n--- 主日礼拝 集計結果 ---")
display(shujitsu_excel_df.head())


# Define the save path in the parent directory of the data directory
# --- 修正点: 保存パスをデータディレクトリの親ディレクトリに設定 ---
data_directory = '/content/drive/MyDrive/Colab Notebooks/data/downloads/' # Assuming data is in 'downloads' subdir
parent_directory = os.path.dirname(data_directory) # Get the parent directory
output_excel_filename = '礼拝出席者集計_詳細.xlsx' # Use a descriptive name
output_excel_path = os.path.join(parent_directory, output_excel_filename) # Save in the parent directory


# --- Debugging: Check file path and permissions ---
print(f"\nAttempting to save Excel file to: {output_excel_path}")

output_dir = os.path.dirname(output_excel_path)
print(f"Checking output directory: {output_dir}")

# Check if directory exists
if not os.path.exists(output_dir):
    print(f"Output directory does NOT exist. Attempting to create: {output_dir}")
    try:
        os.makedirs(output_dir)
        print(f"Output directory created successfully: {output_dir}")
    except OSError as e:
        print(f"Error creating output directory {output_dir}: {e}")
        print("Please ensure Google Drive is correctly mounted and you have permissions.")

# Check if directory is writable
if os.path.exists(output_dir):
    if os.access(output_dir, os.W_OK):
        print(f"Output directory {output_dir} is writable.")
    else:
        print(f"Output directory {output_dir} is NOT writable.")
        print("Please check Google Drive permissions.")
# --------------------------------------------------

# --- 修正点: 計算された output_excel_path を明示的に表示 ---
print(f"\nCalculated output Excel file path: {output_excel_path}")


print(f"\nSaving integrated results to: {output_excel_path}") # This print statement is now correct


# Save the DataFrames to an Excel file with separate sheets
try:
    with pd.ExcelWriter(output_excel_path) as writer:
        chourei_excel_df.to_excel(writer, sheet_name='朝礼拝_集計', index=False)
        shujitsu_excel_df.to_excel(writer, sheet_name='主日礼拝_集計', index=False)
        # No longer writing mismatched_df to a separate sheet
        # if not mismatched_df.empty:
        #     mismatched_df.to_excel(writer, sheet_name='日付不一致ファイル', index=False)

    print("Successfully saved the detailed attendance counts to Excel.")
    # --- 修正点: ファイル保存後にタイムスタンプとファイルサイズを表示 ---
    if os.path.exists(output_excel_path):
        timestamp = os.path.getmtime(output_excel_path)
        dt_object = datetime.fromtimestamp(timestamp)
        file_size = os.path.getsize(output_excel_path)
        print(f"Saved file timestamp: {dt_object}")
        print(f"Saved file size: {file_size} bytes")
    else:
        print("Saved file not found after successful save message.")


except Exception as e:
    print(f"Error saving Excel file: {e}")
    print("Please ensure Google Drive is correctly mounted and you have write permissions to the directory.")

print("\nFinished processing all files and saving results.")

# --- 修正点: 保存したファイルが存在するか確認するコードを追加 ---
print(f"\nVerifying file existence and timestamp at {output_excel_path} from Colab file system:")
if os.path.exists(output_excel_path):
    timestamp = os.path.getmtime(output_excel_path)
    dt_object = datetime.fromtimestamp(timestamp)
    file_size = os.path.getsize(output_excel_path)
    print(f"File found.")
    print(f"File timestamp from Colab: {dt_object}")
    print(f"File size from Colab: {file_size} bytes")
else:
    print("File NOT found from Colab file system.")

Aggregating attendance data for 49 files...

Finished processing extracted text and dates for all files.

--- Debugging: Final List of unique 朝礼拝 names (unique_chourei_names) ---
['三浦香代子', '上村玲子', '中島しのぶ', '中島ゆりの', '中島和喜', '中島康文', '中島康文･美津江', '中島瑞貴', '中島結実枝', '中島美津江', '中村優響', '中村恵佑', '中村淳平', '佐々木彩乃', '佐々木真喜子', '土本千保美', '土本瑞希', '富田桃香', '川口学', '斉藤純子', '木下多津子', '木下学', '木下結愛', '本城美貴', '本城葵', '柳川昌平', '柳川瞬平', '柳川葉子', '江藤直純', '矢坂陽子', '福島由貴子', '芳野豊', '長谷川卓也', '長谷川安奈', '長谷川耀子', '長谷川青水', 'ｸﾏﾗｼﾝﾊﾏ･美羽', 'ｸﾏﾗｼﾝﾊﾑ新', 'ｸﾏﾗｼﾝﾊﾑ美優', 'ｸﾏﾗｼﾝﾊﾑ美羽', 'ｸﾏﾗｼﾝﾊﾑ･新', 'ｸﾏﾗｼﾝﾊﾑ･美羽', 'ｸﾏﾗｼﾝﾊﾑ･ｴｲﾄﾞﾘｱﾝ']

--- Debugging: Final List of unique 主日礼拝 names (unique_shujitsu_names) ---
['三橋和弘', '三浦純子', '三浦香代子', '上村玲子', '世川勇', '世川岬子', '中井フタバ', '中井康人', '中博明', '中台明美', '中山一枝', '中島和喜', '中島康文', '中島瑞貴', '中島結実枝', '中島美津江', '中川俊介', '中川千恵子', '中川静子', '中村優響', '中村恵佑', '中村沙絵', '中村淳平', '中道敦子', '久保田淑子', '久米大介', '佐藤', '佐藤理恵', '佐藤穂佳', '北岡渓子', '友邉衣香', '友邊衣香', '四戸二予', '四戸大也', '坂本峯子', '外村由紀', '大竹剣太', '大竹庸介', '大関敏子', '宇津木彰', '富田桃香', '小室香', '小澤由美

  chourei_excel_df = chourei_excel_df.fillna(0)
  shujitsu_excel_df = shujitsu_excel_df.fillna(0)



--- Debugging: Final 朝礼拝 DataFrame before saving ---


Unnamed: 0,氏名,2024-01-07,2024-01-14,2024-01-21,2024-01-28,2024-02-04,2024-02-11,2024-02-18,2024-02-25,2024-03-03,...,2024-10-13,2024-10-20,2024-10-27,2024-11-03,2024-11-10,2024-11-17,2024-11-24,2024-12-01,2024-12-08,合計出現回数
0,三浦香代子,0,0,0,0,1,1,1,1,0,...,0,0,0,0,0,0,0,0,0,5
1,上村玲子,1,0,0,0,1,0,0,0,1,...,0,0,0,0,1,0,0,0,0,12
2,中島しのぶ,0,0,0,0,0,0,0,0,0,...,1,1,0,1,1,1,1,0,0,13
3,中島ゆりの,0,0,0,0,0,0,0,0,0,...,1,1,0,1,1,1,1,0,1,14
4,中島和喜,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1



--- Debugging: Final 主日礼拝 DataFrame before saving ---


Unnamed: 0,氏名,2024-01-07,2024-01-14,2024-01-21,2024-01-28,2024-02-04,2024-02-11,2024-02-18,2024-02-25,2024-03-03,...,2024-10-13,2024-10-20,2024-10-27,2024-11-03,2024-11-10,2024-11-17,2024-11-24,2024-12-01,2024-12-08,合計出現回数
0,三橋和弘,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,三浦純子,1,1,0,1,0,1,1,1,1,...,1,1,0,1,1,1,1,1,0,35
2,三浦香代子,0,1,0,1,1,1,1,1,0,...,0,0,0,0,0,0,0,0,0,7
3,上村玲子,0,0,0,0,0,0,0,0,0,...,0,0,1,0,1,0,0,0,0,3
4,世川勇,1,1,1,1,1,1,1,1,0,...,0,0,0,0,0,0,0,0,0,9


-------------------------------------------------------

--- 朝礼拝 集計結果 ---


Unnamed: 0,氏名,2024-01-07,2024-01-14,2024-01-21,2024-01-28,2024-02-04,2024-02-11,2024-02-18,2024-02-25,2024-03-03,...,2024-10-13,2024-10-20,2024-10-27,2024-11-03,2024-11-10,2024-11-17,2024-11-24,2024-12-01,2024-12-08,合計出現回数
0,三浦香代子,0,0,0,0,1,1,1,1,0,...,0,0,0,0,0,0,0,0,0,5
1,上村玲子,1,0,0,0,1,0,0,0,1,...,0,0,0,0,1,0,0,0,0,12
2,中島しのぶ,0,0,0,0,0,0,0,0,0,...,1,1,0,1,1,1,1,0,0,13
3,中島ゆりの,0,0,0,0,0,0,0,0,0,...,1,1,0,1,1,1,1,0,1,14
4,中島和喜,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1



--- 主日礼拝 集計結果 ---


Unnamed: 0,氏名,2024-01-07,2024-01-14,2024-01-21,2024-01-28,2024-02-04,2024-02-11,2024-02-18,2024-02-25,2024-03-03,...,2024-10-13,2024-10-20,2024-10-27,2024-11-03,2024-11-10,2024-11-17,2024-11-24,2024-12-01,2024-12-08,合計出現回数
0,三橋和弘,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,三浦純子,1,1,0,1,0,1,1,1,1,...,1,1,0,1,1,1,1,1,0,35
2,三浦香代子,0,1,0,1,1,1,1,1,0,...,0,0,0,0,0,0,0,0,0,7
3,上村玲子,0,0,0,0,0,0,0,0,0,...,0,0,1,0,1,0,0,0,0,3
4,世川勇,1,1,1,1,1,1,1,1,0,...,0,0,0,0,0,0,0,0,0,9



Attempting to save Excel file to: /content/drive/MyDrive/Colab Notebooks/data/downloads/礼拝出席者集計_詳細.xlsx
Checking output directory: /content/drive/MyDrive/Colab Notebooks/data/downloads
Output directory /content/drive/MyDrive/Colab Notebooks/data/downloads is writable.

Calculated output Excel file path: /content/drive/MyDrive/Colab Notebooks/data/downloads/礼拝出席者集計_詳細.xlsx

Saving integrated results to: /content/drive/MyDrive/Colab Notebooks/data/downloads/礼拝出席者集計_詳細.xlsx
Successfully saved the detailed attendance counts to Excel.
Saved file timestamp: 2025-08-09 14:39:41
Saved file size: 26375 bytes

Finished processing all files and saving results.

Verifying file existence and timestamp at /content/drive/MyDrive/Colab Notebooks/data/downloads/礼拝出席者集計_詳細.xlsx from Colab file system:
File found.
File timestamp from Colab: 2025-08-09 14:39:41
File size from Colab: 26375 bytes
