# Mineral name silliness

I used [tabula](http://tabula.technology/) to get the data from [this 200+ page PDF table from IMA](http://nrmima.nrm.se//imalist.htm). Technology is awesome!

Here's the spreadsheet with the data. (Not checked rigorously, but the first column seems OK and that's what I wanted.)

https://docs.google.com/spreadsheets/d/1yS5bM-ld_JnuOcamn6I4WUnJnxTNfBszuY5i3ijpk7Q/edit?usp=sharing

Some prelims:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
%matplotlib inline

import pandas as pd
import requests
import io
import re

Read the data:

In [2]:
with open('./data/IMA_mineral_names.txt', 'r') as f:
    names = [i.strip() for i in f.readlines()]

## Longest names

In [3]:
longest = sorted(names, key=len, reverse=True)[:5]
[(w, len(w)) for w in longest]

[('fluorotetraferriphlogopite', 26),
 ('hydroniumpharmacosiderite', 25),
 ('hydroniumpharmacoalumite', 24),
 ('hydroxymanganopyrochlore', 24),
 ('magnesiochlorophoenicite', 24)]

## Most letters of alphabet

In [4]:
letters = {n: [30*'.'] for n in range(1, 27)}
for w in names:
    current = letters[len(set(w))][-1]
    if len(w) <= len(current):
        letters[len(set(w))].append(w)
{k: list(filter(lambda x: x!=30*'.', v)) for k, v in letters.items()}

{1: [],
 2: [],
 3: ['ice', 'tin'],
 4: ['eitelite',
  'eveite',
  'gold',
  'iron',
  'lead',
  'lime',
  'opal',
  'talc',
  'urea',
  'zinc'],
 5: ['agaite', 'beryl', 'niter', 'topaz', 'trona', 'uvite'],
 6: ['abellaite',
  'abuite',
  'afmite',
  'ajoite',
  'albite',
  'augite',
  'baryte',
  'cerium',
  'curite',
  'davyne',
  'gayite',
  'gypsum',
  'hakite',
  'halite',
  'humite',
  'jusite',
  'mohite',
  'nickel',
  'olgite',
  'omsite',
  'paxite',
  'pyrite',
  'quartz',
  'rayite',
  'rutile',
  'schorl',
  'silver',
  'spinel',
  'surite',
  'umbite',
  'zircon'],
 7: ['acanthite',
  'acetamide',
  'achalaite',
  'adachiite',
  'adrianite',
  'aerugite',
  'aheylite',
  'alarsite',
  'aleksite',
  'alunite',
  'arsenic',
  'arupite',
  'azurite',
  'backite',
  'brucite',
  'caoxite',
  'cardite',
  'cavoite',
  'celsian',
  'chaoite',
  'corkite',
  'cuprite',
  'cyprine',
  'dalyite',
  'dozyite',
  'dravite',
  'dualite',
  'fangite',
  'flamite',
  'gahnite',
  'glad

That is pretty gross, I apologize.

Anyway, the winner is "hydrobasaluminite", with 15 letters of the alphabet.

## Shortest names with all the vowels


In [5]:
sorted([w for w in names if not (set('aeiou') - set(w))], key=len)[:5]

['rouaite', 'anduoite', 'aurorite', 'ourayite', 'poubaite']

## Names with no vowels

In [6]:
[w for w in names if len(set(w)-set('aeiou')) == len(set(w))]

[]

## Most unique letters, no repeats

In [7]:
sorted([w for w in names if len(w) == len(set(w))], key=len, reverse=True)[:5]

['hydrocalumite',
 'plumbonacrite',
 'brandholzite',
 'fluorcaphite',
 'hansblockite']

## Shortest name for each letter of the alphabet

In [8]:
result = {chr(65+i).lower():'.'*30 for i in range(26)}
for name in names:
    if len(name) < len(result[name[0]]):
        result[name[0]] = name
result

{'a': 'abuite',
 'b': 'beryl',
 'c': 'cerium',
 'd': 'davyne',
 'e': 'elyite',
 'f': 'fangite',
 'g': 'gold',
 'h': 'hafnon',
 'i': 'ice',
 'j': 'jusite',
 'k': 'keyite',
 'l': 'lead',
 'm': 'minium',
 'n': 'niter',
 'o': 'opal',
 'p': 'paxite',
 'q': 'quartz',
 'r': 'rayite',
 's': 'schorl',
 't': 'tin',
 'u': 'urea',
 'v': 'vaesite',
 'w': 'wadeite',
 'x': 'xieite',
 'y': 'yagiite',
 'z': 'zinc'}

## Longest name for each letter of the alphabet

In [9]:
result = {chr(65+i).lower():'' for i in range(26)}
for name in names:
    if len(name) > len(result[name[0]]):
        result[name[0]] = name
result

{'a': 'ammoniomagnesiovoltaite',
 'b': 'bariopharmacosiderite',
 'c': 'carbonatecyanotrichite',
 'd': 'disulfodadsonite',
 'e': 'erythrosiderite',
 'f': 'fluorotetraferriphlogopite',
 'g': 'galloplumbogummite',
 'h': 'hydroniumpharmacosiderite',
 'i': 'isoferroplatinum',
 'j': 'jacquesdietrichite',
 'k': 'kenoplumbomicrolite',
 'l': 'lukkulaisvaaraite',
 'm': 'magnesiochlorophoenicite',
 'n': 'natropharmacosiderite',
 'o': 'oxycalciopyrochlore',
 'p': 'phosphoellenbergerite',
 'q': 'quetzalcoatlite',
 'r': 'reinhardbraunsite',
 's': 'strontiopharmacosiderite',
 't': 'thalliumpharmacosiderite',
 'u': 'uytenbogaardtite',
 'v': 'vandendriesscheite',
 'w': 'wilhelmvierlingite',
 'x': 'xiangjiangite',
 'y': 'yangzhumingite',
 'z': 'zhemchuzhnikovite'}

## Longest list of minerals with fewest unique letters

This is getting ridiculous... anyway, one of my searches turned up `['ice', 'teineite', 'cetineite', 'tinticite']` — 29 letters, only 5 letters — and I wondered if there were more. Only 'tin' it turns out...

In [11]:
[n for n in names if not set(n) - set('icent')]

['cetineite', 'ice', 'teineite', 'tin', 'tinticite']

I'm sure someone can beat that: 5 minerals, 32 letters, only 5 unique.