# FamilySearch Android App - Sibling Feature Analysis

Analysis of Google Play Store reviews to identify customer demand for viewing ancestor siblings in pedigree view.

**Data**: 36 CSV files (Jan 2023 - Dec 2025) from `data/feedback/android/`

In [1]:
import pandas as pd
import glob
import numpy as np
from pathlib import Path

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 100)

## 1. Load and Explore Data

In [2]:
# Load all CSV files
csv_files = sorted(glob.glob('../data/feedback/android/*.csv'))
print(f"Found {len(csv_files)} CSV files")
print(f"Date range: {Path(csv_files[0]).stem.split('_')[-1]} to {Path(csv_files[-1]).stem.split('_')[-1]}")

Found 36 CSV files
Date range: 202301 to 202512


In [3]:
# Load a single file to examine structure
# Try different encodings to handle the file properly
sample_df = pd.read_csv(csv_files[0], encoding='utf-16')
print(f"Shape: {sample_df.shape}")
print(f"\nColumns:\n{sample_df.columns.tolist()}")
sample_df.head(3)

Shape: (240, 16)

Columns:
['Package Name', 'App Version Code', 'App Version Name', 'Reviewer Language', 'Device', 'Review Submit Date and Time', 'Review Submit Millis Since Epoch', 'Review Last Update Date and Time', 'Review Last Update Millis Since Epoch', 'Star Rating', 'Review Title', 'Review Text', 'Developer Reply Date and Time', 'Developer Reply Millis Since Epoch', 'Developer Reply Text', 'Review Link']


Unnamed: 0,Package Name,App Version Code,App Version Name,Reviewer Language,Device,Review Submit Date and Time,Review Submit Millis Since Epoch,Review Last Update Date and Time,Review Last Update Millis Since Epoch,Star Rating,Review Title,Review Text,Developer Reply Date and Time,Developer Reply Millis Since Epoch,Developer Reply Text,Review Link
0,org.familysearch.mobile,40436.0,4.4.17,fr,TECNO-CG6,2023-01-01T10:27:16Z,1672568836593,2023-01-01T10:27:16Z,1672568836593,5,,,,,,
1,org.familysearch.mobile,40624.0,4.5.4,it,jackpotlte,2023-01-01T13:11:23Z,1672578683195,2023-01-01T13:11:23Z,1672578683195,4,,La mia esperienza Ã¨ stata molto buona. Certo funziona meglio negli Stati Uniti!,,,,http://play.google.com/console/developers/8264880723452882397/app/4972259123017277510/user-feedb...
2,org.familysearch.mobile,40624.0,4.5.4,en,a13x,2023-01-01T15:55:27Z,1672588527112,2023-01-01T15:55:27Z,1672588527112,3,,Information is great . Having to login n password when u have app is very irritating,,,,http://play.google.com/console/developers/8264880723452882397/app/4972259123017277510/user-feedb...


In [4]:
# Load all files into a single dataframe
dfs = []
for file in csv_files:
    try:
        df = pd.read_csv(file, encoding='utf-16')
        dfs.append(df)
    except Exception as e:
        print(f"Error loading {file}: {e}")

# Combine all dataframes
all_reviews = pd.concat(dfs, ignore_index=True)
print(f"\nTotal reviews loaded: {len(all_reviews):,}")
print(f"Date range: {all_reviews['Review Submit Date and Time'].min()} to {all_reviews['Review Submit Date and Time'].max()}")


Total reviews loaded: 11,060
Date range: 2014-09-10T16:07:07Z to 2025-12-31T23:05:44Z


In [5]:
# Basic statistics
print("Review Text Statistics:")
print(f"Total reviews: {len(all_reviews):,}")
print(f"Reviews with text: {all_reviews['Review Text'].notna().sum():,}")
print(f"Reviews without text: {all_reviews['Review Text'].isna().sum():,}")
print(f"\nLanguage distribution:")
print(all_reviews['Reviewer Language'].value_counts().head(10))

Review Text Statistics:
Total reviews: 11,060
Reviews with text: 4,311
Reviews without text: 6,749

Language distribution:
Reviewer Language
en    4294
es    2756
pt    2675
fr     253
it     160
ru     125
de      94
pl      81
hu      74
ar      72
Name: count, dtype: int64


## 2. Filter for Reviews with Text Content

In [6]:
# Filter for reviews that have actual text content
reviews_with_text = all_reviews[all_reviews['Review Text'].notna() & (all_reviews['Review Text'].str.strip() != '')].copy()
print(f"Reviews with text content: {len(reviews_with_text):,}")
print(f"Percentage with text: {len(reviews_with_text)/len(all_reviews)*100:.1f}%")

Reviews with text content: 4,311
Percentage with text: 39.0%


## 3. Search for Sibling-Related Mentions

Search for keywords related to siblings, brothers, sisters, and related family relationships in ancestor context.

In [7]:
# Define search terms for sibling-related mentions
# Include English and common variations
sibling_keywords = [
    'sibling', 'siblings',
    'brother', 'brothers', 'sister', 'sisters',
    'aunt', 'aunts', 'uncle', 'uncles',
    'nephew', 'nephews', 'niece', 'nieces',
    'cousin', 'cousins'
]

# Create search pattern (case insensitive)
pattern = '|'.join(sibling_keywords)
print(f"Search pattern: {pattern}")

Search pattern: sibling|siblings|brother|brothers|sister|sisters|aunt|aunts|uncle|uncles|nephew|nephews|niece|nieces|cousin|cousins


In [8]:
# Search for sibling mentions in review text
sibling_mentions = reviews_with_text[reviews_with_text['Review Text'].str.contains(pattern, case=False, na=False)].copy()
print(f"\nReviews mentioning siblings/related family: {len(sibling_mentions):,}")
print(f"Percentage of all reviews: {len(sibling_mentions)/len(all_reviews)*100:.2f}%")
print(f"Percentage of reviews with text: {len(sibling_mentions)/len(reviews_with_text)*100:.2f}%")


Reviews mentioning siblings/related family: 23
Percentage of all reviews: 0.21%
Percentage of reviews with text: 0.53%


In [9]:
# Display sample of sibling-related reviews
print("Sample reviews mentioning siblings/family:")
print("="*80)
for idx, row in sibling_mentions.head(10).iterrows():
    print(f"\nDate: {row['Review Submit Date and Time']}")
    print(f"Rating: {row['Star Rating']} stars")
    print(f"Language: {row['Reviewer Language']}")
    print(f"Review: {row['Review Text']}")
    print("-"*80)

Sample reviews mentioning siblings/family:

Date: 2023-01-24T17:17:00Z
Rating: 5 stars
Language: en
Review: Ok... Just sayin.... Not dating app for Texans! Stick with goat love, sister wives, and dna checker!
--------------------------------------------------------------------------------

Date: 2023-05-14T17:54:49Z
Rating: 3 stars
Language: en
Review: I believe that the app should make sm option for providing half siblings
--------------------------------------------------------------------------------

Date: 2023-06-14T10:04:47Z
Rating: 4 stars
Language: en
Review: Great for a quick look. Wow, I turned on "see relatives nearby" while at church and found out a lot are 3rd or4th cousins.
--------------------------------------------------------------------------------

Date: 2022-09-02T01:31:50Z
Rating: 4 stars
Language: es
Review: Love it! But please address these 3 issues: 1. Suggest the spouse's immediate family members when attaching people from records! They're often present in bap

## 4. Export Results

In [10]:
# Save sibling-related reviews to CSV for further analysis
output_file = '../data/sibling_mentions.csv'
sibling_mentions.to_csv(output_file, index=False, encoding='utf-8')
print(f"Saved {len(sibling_mentions)} reviews to {output_file}")

Saved 23 reviews to ../data/sibling_mentions.csv
