Data was gathered from Google Fonts API. The data was avaliable as a json file - a list of dictionaries located in one line.
Saving the data in a csv format. 


In [9]:
import json
import csv

# Open the JSON file and load the data
with open('fontinfo.json') as f:
    data = json.loads(f.read())

# Open the CSV file and create a CSV writer object
with open('fontinfo.csv', 'w', newline='') as f:
    writer = csv.writer(f)

    # Write the header row to the CSV file
    header = data[0].keys()
    writer.writerow(header)

    # Write each row of data to the CSV file
    for row in data:
        writer.writerow(row.values())


Fonts can evoke specific moods based on the form or the era they were inspired by. Depending on the project, designers need to convey and communicate a feeling through the design. 

**Serif**: often seen as formal fonts that can evoke an older vibe. Use serifs for long-form copy, like books, blogs, or magazines. The serifs help the reader’s eyes follow the letterforms easily. <br> <br>
**Sans Serif**: one of the most versatile categories. You can use them as display or long-form copy. These letterforms are clean, minimal, and modern-looking. Some fonts in this category can be neutral, while others can have just a touch of personality that can add some zing to your design. <br> <br>
**Script**: whether you use a formal or informal script font, you’ll hands down communicate an old-world vibe. Use these fonts on historical pieces, wedding invitations, and book covers. <br><br>
**Handwriting and Calligraphic**: if you want to evoke a personal feel, this is the font for your project. Mostly informal, this font can vary in styles. Do be careful when choosing one for your project as depending on the style you can add a certain mood that can range from cute to grunge. <br><br>
**Display/Decorative**: like the last few examples, use display as display. These fonts are usually designed with a very specific purpose in mind—to call for attention. Don’t use these fonts at a small scale as some decorations can make them difficult to read. <br><br>

Based on https://design.tutsplus.com/articles/the-different-types-of-fonts-when-to-use-each-font-type-and-when-not--cms-33346

**Sans Serif Association**: 
modern, clean, universal, open, informal, progressive <br><br>

**Script Association**: 
elegance, affectionate, creativity, personal, feminine, fancy <br><br>

**Modern Association**: 
stylish, chic, exclusivity, sharp, fashionable, futuristic <br><br>

**Display / Decorative Association**: 
friendly, unique, expressive, quirky, eclectic <br><br>

Based on Psychology of onscreen type: investigations regarding typeface personality, appropriateness, and impact on document perception. More can be found here: Shaikh, A. D., Chaparro, B. S., & Fox, D. (2006). Perception of Fonts: Perceived Personality Traits and Uses. Usability News, 8(1).

   

In [4]:
import pandas as pd 

fontinfo = pd.read_csv('fontinfo.csv')

fontinfo.head()
#fontinfo["category"].value_counts()



Unnamed: 0,id,family,subsets,weights,styles,defSubset,variable,lastModified,category,version,type
0,abeezee,ABeeZee,"['latin', 'latin-ext']",[400],"['italic', 'normal']",latin,False,2022-09-22,sans-serif,v22,google
1,abel,Abel,['latin'],[400],['normal'],latin,False,2022-09-22,sans-serif,v18,google
2,abhaya-libre,Abhaya Libre,"['latin', 'latin-ext', 'sinhala']","[400, 500, 600, 700, 800]",['normal'],latin,False,2022-09-22,serif,v13,google
3,aboreto,Aboreto,"['latin', 'latin-ext']",[400],['normal'],latin,False,2022-09-22,display,v2,google
4,abril-fatface,Abril Fatface,"['latin', 'latin-ext']",[400],['normal'],latin,False,2022-09-22,display,v19,google


sans-serif     594
display        391
serif          297
handwriting    221
monospace       44
other           14
Name: category 

In [4]:
# search through column "category" for "other"

fontinfo[fontinfo["category"].str.contains("other", na=False)]

Unnamed: 0,id,family,subsets,weights,styles,defSubset,variable,lastModified,category,version,type
368,dseg-weather,DSEG Weather,['latin'],[400],['normal'],latin,False,2020-08-02,other,v0.46,other
369,dseg14,DSEG14,"['classic', 'classic-mini', 'modern', 'modern-...","[300, 400, 700]","['italic', 'normal']",classic,False,2020-08-02,other,v0.46,other
370,dseg7,DSEG7,"['classic', 'classic-mini', 'modern', 'modern-...","[300, 400, 700]","['italic', 'normal']",classic,False,2020-08-02,other,v0.46,other
781,material-icons,Material Icons,['base'],[400],['normal'],base,False,2021-11-15,other,v4,icons
782,material-icons-outlined,Material Icons Outlined,['all'],[400],['normal'],all,False,2021-11-15,other,v4,icons
783,material-icons-rounded,Material Icons Rounded,['all'],[400],['normal'],all,False,2021-11-15,other,v4,icons
784,material-icons-sharp,Material Icons Sharp,['all'],[400],['normal'],all,False,2021-11-15,other,v4,icons
785,material-icons-two-tone,Material Icons Two Tone,['all'],[400],['normal'],all,False,2021-11-15,other,v4,icons
1523,yakuhanjp,YakuHanJP,['japanese'],"[100, 200, 300, 400, 500, 700, 900]",['normal'],japanese,False,2020-08-02,other,v3.3.1,other
1524,yakuhanjps,YakuHanJPs,['japanese'],"[100, 200, 300, 400, 500, 700, 900]",['normal'],japanese,False,2020-08-02,other,v3.3.1,other


Asigning connotation to font categories 

In [7]:

# Create a dictionary mapping each category to its associated connotations
connotations = {
    'sans-serif': 'modern, clean, open, informal, progressive, simple, minimal, contemporary',
    'display': 'bold, creative, eye-catching, decorative, playful, attention-grabbing, unique',
    'serif': 'traditional, elegant, trustworthy, formal, classic, established, reliable',
    'handwriting': 'personal, friendly, whimsical, expressive, elegant, romantic, feminine',
    'monospace': 'technical, precise, retro, futuristic, minimal, structured',
    'other': 'unique, quirky, unconventional, experimental, artistic, eclectic',
}

# Create a new column called "Connotations" based on the values in the "category" column
fontinfo['connotations'] = fontinfo['category'].apply(lambda x: connotations[x])

# Print the updated dataframe to confirm that the new column has been added
print(fontinfo.head())

# save the dataframe to a new CSV file
fontinfo.to_csv('fontswithconnotations.csv', index=False)


         family    Category  rating            id  \
0        Roboto  sans-serif       1        roboto   
1     Open Sans  sans-serif       2     open-sans   
2  Noto Sans JP  sans-serif       3  noto-sans-jp   
3    Montserrat  sans-serif       4    montserrat   
4          Lato  sans-serif       5          lato   

                                         weights                styles  \
0                 [100, 300, 400, 500, 700, 900]  ['italic', 'normal']   
1                 [300, 400, 500, 600, 700, 800]  ['italic', 'normal']   
2                 [100, 300, 400, 500, 700, 900]            ['normal']   
3  [100, 200, 300, 400, 500, 600, 700, 800, 900]  ['italic', 'normal']   
4                      [100, 300, 400, 700, 900]  ['italic', 'normal']   

                                        connotations   character  
0  modern, clean, universal, open, informal, prog...  character1  
1  modern, clean, universal, open, informal, prog...  character1  
2  modern, clean, universal, open, 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1['character']= df1['Category'].apply(lambda x: character[x])


In [12]:
import os
import shutil

def find_and_copy_ttf_files(src_dir, dst_dir):
    # search for all .ttf files within the src_dir directory and its subdirectories
    for root, dirs, files in os.walk(src_dir):
        for file in files:
            if file.endswith(".ttf"):
                # Construct the source and destination file paths
                src_file_path = os.path.join(root, file)
                dst_file_path = os.path.join(dst_dir, file)
                # Copy the file to the destination directory
                shutil.copy(src_file_path, dst_file_path)

# copy all .ttf files in the "ALLFONTS"  to the "output_folder" directory
find_and_copy_ttf_files("/Users/.../Desktop/State of the art tech/fonts-main/ALLFonts", "/Users/.../Desktop/State of the art tech/collectedttf")


Aquiring the the popularity of fonts from Google fonts analytics

In [17]:
import json
import csv

# Open the input JSON file and load its contents
with open('googlefontsapi.json', 'r') as json_file:
    data = json.load(json_file)

# Open the output CSV file in write mode
with open('popularitysorted.csv', 'w', newline='') as csv_file:
    writer = csv.writer(csv_file)


    # Write the header row
    writer.writerow(['family', 'Category', 'Subsets', 'Files'])

    # Write each row of data to the CSV
    for item in data['items']:
        family = item['family']
        category = item['category']
        subsets = ', '.join(item['subsets'])
        files = ', '.join(item['files'].keys())
        link =', '.join(item['files'].values())
        writer.writerow([family, category, subsets, files, link])


Add a ranking column to caputure the font popularity


In [18]:

import pandas as pd 

df = pd.read_csv('popularitysorted.csv', index_col=False)

# Add a new column with the position indices

df["rating"] = range(1, len(df) + 1)

# Write the updated DataFrame to a new CSV file
df.to_csv('popularitysorted.csv', index=False)

  df = pd.read_csv('popularitysorted.csv', index_col=False)


Now let's merge two csv files based on common column ("Family")


In [20]:
# reading csv files
data1 = pd.read_csv('popularitysorted.csv')
data2 = pd.read_csv('fontswithconnotations.csv')
  
# merging two dataframes using the column "Family"
merged = data1.merge(data2, on='family')

# writing the merged dataframe to a new csv file
merged.to_csv('merged.csv', index=False)

merged.head()


Unnamed: 0,family,Category,Subsets,Files,rating,id,subsets,weights,styles,defSubset,variable,lastModified,category,version,type,connotations
0,Roboto,sans-serif,"cyrillic, cyrillic-ext, greek, greek-ext, lati...","100, 100italic, 300, 300italic, regular, itali...",1,roboto,"['cyrillic', 'cyrillic-ext', 'greek', 'greek-e...","[100, 300, 400, 500, 700, 900]","['italic', 'normal']",latin,False,2022-05-12,sans-serif,v30,google,"modern, clean, universal, open, informal, prog..."
1,Open Sans,sans-serif,"cyrillic, cyrillic-ext, greek, greek-ext, hebr...","300, regular, 500, 600, 700, 800, 300italic, i...",2,open-sans,"['cyrillic', 'cyrillic-ext', 'greek', 'greek-e...","[300, 400, 500, 600, 700, 800]","['italic', 'normal']",latin,True,2022-09-22,sans-serif,v34,google,"modern, clean, universal, open, informal, prog..."
2,Noto Sans JP,sans-serif,"japanese, latin","100, 300, regular, 500, 700, 900",3,noto-sans-jp,"['japanese', 'latin']","[100, 300, 400, 500, 700, 900]",['normal'],latin,False,2022-09-27,sans-serif,v42,google,"modern, clean, universal, open, informal, prog..."
3,Montserrat,sans-serif,"cyrillic, cyrillic-ext, latin, latin-ext, viet...","100, 200, 300, regular, 500, 600, 700, 800, 90...",4,montserrat,"['cyrillic', 'cyrillic-ext', 'latin', 'latin-e...","[100, 200, 300, 400, 500, 600, 700, 800, 900]","['italic', 'normal']",latin,True,2022-09-22,sans-serif,v25,google,"modern, clean, universal, open, informal, prog..."
4,Lato,sans-serif,"latin, latin-ext","100, 100italic, 300, 300italic, regular, itali...",5,lato,"['latin', 'latin-ext']","[100, 300, 400, 700, 900]","['italic', 'normal']",latin,False,2022-09-22,sans-serif,v23,google,"modern, clean, universal, open, informal, prog..."


Let's remove the duplicate and not needed columns:


In [53]:


# remove multiple columns from a dataframe by name  
import pandas as pd


df1 = pd.read_csv('merged.csv') 


# # # Remove a column by name
# df1=df1.drop(["type"], axis = 1)

# # # Remove a column by name
# df1=df1.drop(["category"], axis = 1)

# # Remove a column by name
# df1=df1.drop(["Files"], axis = 1)

# # Remove a column by name
# df1=df1.drop(["variable"], axis = 1)

# # Remove a column by name
# df1=df1.drop(["defSubset"], axis = 1)

# # Remove a column by name
# df1=df1.drop(["subsets"], axis = 1)

# # Remove a column by name
# df1=df1.drop(["Subsets"], axis = 1)

# # # Remove a column by name
# df1=df1.drop(["lastModified"], axis = 1)

# # Remove a column by name
# df1=df1.drop(["version"], axis = 1)

# writing the merged dataframe to a new csv file
df1.to_csv('merged.csv', index=False)

df1.head()


Unnamed: 0,family,Category,rating,id,weights,styles,connotations
0,Roboto,sans-serif,1,roboto,"[100, 300, 400, 500, 700, 900]","['italic', 'normal']","modern, clean, universal, open, informal, prog..."
1,Open Sans,sans-serif,2,open-sans,"[300, 400, 500, 600, 700, 800]","['italic', 'normal']","modern, clean, universal, open, informal, prog..."
2,Noto Sans JP,sans-serif,3,noto-sans-jp,"[100, 300, 400, 500, 700, 900]",['normal'],"modern, clean, universal, open, informal, prog..."
3,Montserrat,sans-serif,4,montserrat,"[100, 200, 300, 400, 500, 600, 700, 800, 900]","['italic', 'normal']","modern, clean, universal, open, informal, prog..."
4,Lato,sans-serif,5,lato,"[100, 300, 400, 700, 900]","['italic', 'normal']","modern, clean, universal, open, informal, prog..."


In [60]:

# df = pd.read_csv('merged.csv')

# # Split the connotations column into multiple columns to prepare for one-hot encoding

# for index, row in df.iterrows():
#     for genre in row['connotations'].split(', '):
#         df.at[index, genre] = 1
# #Filling in the NaN values with 0 
# df = df.fillna(0)
# # df.head(20)

# # df to csv
# df.to_csv('merged.csv', index=False)





TypeError: 'builtin_function_or_method' object is not iterable

In [8]:
df = pd.read_csv('merged.csv')

# create df1 with only the columns we want: family, Category, rating, id, weights, styles, connotations
df1 = df[['family', 'Category', 'rating', 'id', 'weights', 'styles', 'connotations']]

character = {
    'sans-serif': 'character1',
    'display': 'character2',
    'serif': 'character3',
    'handwriting': 'character4',
    'monospace': 'character5'

}

# # Create a new column called "character" based on the values in the "category" column
df1['character']= df1['Category'].apply(lambda x: character[x])

# Print the updated dataframe to confirm that the new column has been added
print(df1.head())

# save the dataframe to a new CSV file
df1.to_csv('characterbasedfonts.csv', index=False)
# fontinfo.to_csv('fontswithconnotations.csv', index=False)

df1['weights'].describe()


         family    Category  rating            id  \
0        Roboto  sans-serif       1        roboto   
1     Open Sans  sans-serif       2     open-sans   
2  Noto Sans JP  sans-serif       3  noto-sans-jp   
3    Montserrat  sans-serif       4    montserrat   
4          Lato  sans-serif       5          lato   

                                         weights                styles  \
0                 [100, 300, 400, 500, 700, 900]  ['italic', 'normal']   
1                 [300, 400, 500, 600, 700, 800]  ['italic', 'normal']   
2                 [100, 300, 400, 500, 700, 900]            ['normal']   
3  [100, 200, 300, 400, 500, 600, 700, 800, 900]  ['italic', 'normal']   
4                      [100, 300, 400, 700, 900]  ['italic', 'normal']   

                                        connotations   character  
0  modern, clean, universal, open, informal, prog...  character1  
1  modern, clean, universal, open, informal, prog...  character1  
2  modern, clean, universal, open, 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df1['character']= df1['Category'].apply(lambda x: character[x])


count      1479
unique       57
top       [400]
freq        870
Name: weights, dtype: object

In [2]:
# import pandas as pd

# df = pd.read_csv('merged.csv')

# for index, row in df.iterrows():
#     for genre in row['weights'].split(', '):
#         df.at[index, genre] = 1
# #Filling in the NaN values with 0 
# df = df.fillna(0)

# # df to csv
# df.to_csv('merged1.csv', index=False)


In [None]:
df = pd.read_csv('merged.csv')

In [10]:
import pandas as pd

# Load CSV file into DataFrame
df = pd.read_csv('characterbasedfonts.csv')

# Convert weights column from list to comma-separated string
df['weights'] = df['weights'].apply(lambda x: ','.join(map(str, eval(x))))

# View updated DataFrame
print(df.head())

# Save updated DataFrame to CSV file
df.to_csv('characterbasedfonts.csv', index=False)

         family    Category  rating            id  \
0        Roboto  sans-serif       1        roboto   
1     Open Sans  sans-serif       2     open-sans   
2  Noto Sans JP  sans-serif       3  noto-sans-jp   
3    Montserrat  sans-serif       4    montserrat   
4          Lato  sans-serif       5          lato   

                               weights                styles  \
0              100,300,400,500,700,900  ['italic', 'normal']   
1              300,400,500,600,700,800  ['italic', 'normal']   
2              100,300,400,500,700,900            ['normal']   
3  100,200,300,400,500,600,700,800,900  ['italic', 'normal']   
4                  100,300,400,700,900  ['italic', 'normal']   

                                        connotations   character  
0  modern, clean, universal, open, informal, prog...  character1  
1  modern, clean, universal, open, informal, prog...  character1  
2  modern, clean, universal, open, informal, prog...  character1  
3  modern, clean, universal,

In [77]:
import re
import pandas as pd
import csv

# to help with font presentation and file access:

import os

folder_path = "/Users/paulinagdaniec/Desktop/State of the art tech/collectedttf"  


for filename in os.listdir(folder_path):
    if filename.endswith(".ttf"):
        new_filename = filename.replace("wght", "").replace("wdth", "").replace("[GRADXOPQXTRAYOPQYTASYTDEYTFIYTUCslnt]", "").replace("[XROTYROT]", "").replace("[BNCEINFMSPAC]", "").replace("[CASLCRSVMONOslnt]", "").replace("[opsz]", "").replace("[wdth,wght]", "").replace("[ELGRELSH]", "").replace("[slnt]", "").replace("[YOPQ]", "").replace("[EDPTEHLT]", "").replace("[HEXP]", "").replace("[,]", "").replace("[", "").replace("]", "").replace(",", "").replace("opsz","").replace("SOFTWONK","")
        old_filepath = os.path.join(folder_path, filename)
        new_filepath = os.path.join(folder_path, new_filename)
        os.rename(old_filepath, new_filepath)





KeyboardInterrupt: 

In [81]:
import os
import pandas as pd


# getting the font names from the csv file to compare with the font files in the folder

# Load availablefonts.csv into a DataFrame
df_fonts = pd.read_csv("characterbasedfonts.csv")

# Define the path to the folder containing the .ttf files
fonts_folder_path = "/Users/paulinagdaniec/Desktop/State of the art tech/collectedttf"  

# Loop through each row in the DataFrame
for index, row in df_fonts.iterrows():
    # Get the font name/id from the "id" column
    font_id = row["id"]
    
    # Replace "-" characters in font id with spaces
    font_id = font_id.replace("-", "")
    
    # Get the list of file names in the fonts folder
    font_file_names = os.listdir(fonts_folder_path)
    
    # Filter the list to include only files containing the font id in their name
    matching_file_names = [file_name for file_name in font_file_names if font_id.lower() in file_name.lower()]
    
    # Get the list of matching font names without the .ttf extension
    matching_font_names = [os.path.splitext(file_name)[0] for file_name in matching_file_names]
    
    # Join the matching font names into a comma-separated string
    matching_font_names_str = ",".join(matching_font_names)
    
    # Add the matching font names to a new column in the DataFrame
    df_fonts.at[index, "available_fonts"] = matching_font_names_str
    
# Save the updated DataFrame to a new CSV file
df_fonts.to_csv("avfonts.csv", index=False)



Removing fonts from data base that dont have a matching .ttf file.

In [82]:
df = pd.read_csv("avfonts.csv")

# Filter out rows with empty values in the specified column
df = df[df["available_fonts"].notnull()]

# Save the updated DataFrame to a new CSV file
df.to_csv("avfonts.csv", index=False)