# Astrology Verification via Language Model

This Jupyter notebook introduces an innovative project aimed at testing the validity of astrology. The project involves the use of a language model to analyze the biographies of people born on specific dates, corresponding to each astrological sign, and summarizing their characteristics. These summaries are then used to see if a certain astrological sign could be assigned based on the identified characteristics.

The individuals are selected based on their birth date and their fame, ensuring a rich biography for the language model to process. The summaries generated by the language model serve as a character analysis based on the individuals' biographies. The language model is then tasked to assign an astrological sign to each person based on these summaries.

The underlying assumptions are that the character analysis based on the biography is accurate and that the correct astrological sign can be determined from these characteristics. This approach offers an intriguing way to examine the claims of astrology through the lens of data analysis and natural language processing.

The biographies will be processed through a local Large Language Model, or Ollama. This tool will assign astrology signs, assuming the model has absorbed enough modern astrological information to make similar conclusions about people as typically done in astrology.

First, we will create a data file by randomly selecting a certain number of renowned individuals born at the midpoint of an astrological sign. This ensures that the characteristics of the specific sign are at their strongest. We will retrieve the names of these individuals, along with the Wikipedia links to their biographies.

Next, we will automatically scrape all the biographies into a CSV or SQL file. This will result in a table containing birthdates, names, and biographies.

Afterward, the OLAMA model will extract characteristics from this data. We will ensure the few-shot prompt functions correctly and verify that it provides the required results.

In this section, we'll loop through the biographies and use them as context for deriving personal information. If there's a 'Personal Life' section available in the Wikipedia page, we'll just take this section. Otherwise, we'll use the whole biography. The derived personal information will be inputted into the OLAMA() function, resulting in a list of short characteristics for each individual.

In [None]:
# PSEUDOCODE

# Importing necessary libraries
import pandas as pd
import wikipediaapi

# Load the data
data = pd.read_csv("biographies.csv")

# Initialize Wikipedia API
wiki_wiki = wikipediaapi.Wikipedia('en')

# Loop through biographies
characteristics = []
for biography in data['Biography']:
    # Check for Personal Life section
    page_py = wiki_wiki.page(biography)
    if 'Personal life' in page_py.sections:
        context = page_py.section_by_title('Personal life').text
    else:
        context = page_py.text

    # Use the context for deriving personal information
    characteristics.append(OLAMA(context))

# Add the characteristics to the data
data['Characteristics'] = characteristics

# Save the data
data.to_csv("biographies_with_characteristics.csv", index=False)


# Ollama

Invoking the local model on the macbook.

In [None]:
from langchain_community.llms import Ollama
llm = Ollama(model="llama2")

In [None]:
llm.invoke('What is 2 + 2?')

We will use a method "few-shot-prompting" to generate the data that we want for our analysis. 

In [None]:
from langchain_core.prompts import (
    ChatPromptTemplate,
    FewShotChatMessagePromptTemplate,
)

examples = [
    {"input": "2+2", "output": "4"},
    {"input": "2+3", "output": "5"},
]

# This is a prompt template used to format each individual example.
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

print(few_shot_prompt.format())

final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", """You are a wondrous wizard of math. 
                        Just use the Human and AI conversation as 
                        an example and answer just to the latest question"""),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)

In [None]:
chain = final_prompt | llm

chain.invoke({"input": "What's the square of a triangle?"})

So, lets use it now to see if it can generate a astrological analysis, based on the knowledge of astrology that the language model might have. To test it, let's first read an astrological analysis of two to three people, their short biography and then the astrological sign in which they were born. Then we put their biographies as input, we let the model behave like an astrologer, and then see if we end up having the same conclusion. We ask the model to give a reason for why it chose what it chose. 

We have to erase the names of the persons. Write a method for that. It is possible to cut out the name of the person manually, but then the text is still so descriptive, that the language model will know who it is a bout, including the date of birth.

So the characteristics first go through a prompt like this: "Describe these traits as if they were from a random person
... , make no reference to Barack Obama"


But then still, much reference is made to the activities of the person and it could be easily known what he/she has done, that's why we throw it through another prompt. "Just summarize the characteristics of the person, without mentioning in any 
... way examples of his/her behavior"

In [None]:
example_prompt = PromptTemplate(
    input_variables=["question", "answer"], template="Question: {question}\n{answer}"
)

print(example_prompt.format(**examples[0]))

In [None]:
prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    suffix="Question: {input}",
    input_variables=["input"],
)

print(prompt.format(input="Who was the father of Mary Ball Washington?"))

In [None]:
print(prompt)

In [None]:
chain = prompt | llm

chain.invoke({"input": "Who was the father of Mary Ball Washington?"})

# Using Replicate

In [1]:
import replicate

import os

# Set the REPLICATE_API_TOKEN environment variable
os.environ['REPLICATE_API_TOKEN'] = 'r8_dnYrQg9fYdsHhyJpejnzmJBUcWWCbWT2zVyLT'

# Verify that the environment variable is set
print(os.environ['REPLICATE_API_TOKEN'])


r8_dnYrQg9fYdsHhyJpejnzmJBUcWWCbWT2zVyLT


In [35]:
person = "Billie Eilish"


prompt = f""" Describe the positive and negative character traits of {person}
"""
print(prompt)


input = {
    "prompt": prompt,
    "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
}

output = replicate.run(
    "meta/meta-llama-3-70b-instruct",
    input=input
)
first_output = "".join(output)
print(first_output)

 Describe the positive and negative character traits of Billie Eilish

Billie Eilish is a talented and influential artist known for her unique style, music, and persona. Here are some positive and negative character traits associated with Billie Eilish:

Positive Traits:

1. Authenticity: Billie Eilish is unapologetically herself, refusing to conform to industry standards or societal expectations. She stays true to her art and vision, which resonates with her fans.
2. Creativity: Eilish is a creative genius, constantly pushing boundaries with her music, visuals, and performances. Her innovative approach has inspired a new generation of artists and fans.
3. Confidence: Billie exudes confidence and self-assurance, which is infectious and empowering to her audience. She's not afraid to speak her mind and stand up for what she believes in.
4. Empathy: Eilish has shown compassion and understanding towards her fans, often using her platform to raise awareness about mental health, environment

In [83]:
def replicate_llama70b(prompt):
    input = {
        "prompt": prompt,
        "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
    }

    output = replicate.run(
        "meta/meta-llama-3-70b-instruct",
        input=input
    )
    return "".join(output)

def characteristics_of(person):
    prompt = f""" Describe the positive and negative character traits of {person}
            """
    
    answer = replicate_llama70b(prompt)
    
    return answer

def unpersonal_characteristics(characteristics):

    prompt = f"""

    positive and negative traits: "{characteristics}"

    Based on these positive and negative traits, make
    a general overview of characteristics while making no reference to {person}. Just summarize the characteristics 
    of the person. So don't mention in any way examples of his/her 
    behavior. """
    
    answer = replicate_llama70b(prompt)

    return answer

def assign_zodiac_to(traits):
    prompt = f""" traits: "{traits}""
            question: "What could be the astrology sign of this person based on these traits?"
            answer: [just answer with one word, for example: "Pisces", "Virgo", not two!]

            """
    answer = replicate_llama70b(prompt)
    return answer

def predicted_astro_sign(person):
    
    # generate characteristics of the person
    characteristics = characteristics_of(person)
    
    # transform the characteristics to unpersonal traits
    unpersonal_traits = unpersonal_characteristics(characteristics)
    
    # draw a zodiac sign based on the traits
    astro_sign = assign_zodiac_to(unpersonal_traits)
    
    #TODO: write something to redo the last method in case 
    # it does not output a single word
    
    return astro_sign


person = 'Vincent van Gogh'
predicted_astro_sign(person)
    

'Pisces'

In [None]:
prompt = f"""

happenings: "{first_output}"

Based on these happenings, make
a general overview of characteristics while making no reference to {person}. Just summarize the characteristics 
of the person. So don't mention in any way examples of his/her 
behavior. """

print(prompt)

input = {
    "prompt": prompt,
    "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
}

output = replicate.run(
    "meta/meta-llama-3-70b-instruct",
    input=input
)

second_output = "".join(output)
print(second_output)

In [None]:


context = second_output

prompt = f""" context: "{context}""

question: "What could be the astrology sign of this person based on these characteristics?"

answer: [just answer with one word, for example: "Pisces", "Virgo", not two!]



"""


print(prompt)
input = {
    "prompt": prompt,
    "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
}

output = replicate.run(
    "meta/meta-llama-3-70b-instruct",
    input=input
)
third_output = "".join(output)

print(third_output)


# Scraping Astro websites



In [44]:
from bs4 import BeautifulSoup
import requests

In [45]:
source = requests.get("https://astro-charts.com/persons/top/aries/").text

https://astro-charts.com/persons/top/virgo/

In [66]:
astrological_signs = [
    "Aries",
    "Taurus",
    "Gemini",
    "Cancer",
    "Leo",
    "Virgo",
    "Libra",
    "Scorpio",
    "Sagittarius",
    "Capricorn",
    "Aquarius",
    "Pisces"
]

for sign in astrological_signs:
    source = requests.get(f"https://astro-charts.com/persons/top/{sign.lower()}/").text


aries
taurus
gemini
cancer
leo
virgo
libra
scorpio
sagittarius
capricorn
aquarius
pisces


In [47]:
soup = BeautifulSoup(source, 'html')
print(soup.prettify())

<!DOCTYPE html>
<!--[if (lt IE 9) ]> <html class="ie"> <![endif]-->
<!--[if (gt IE 9)|!(IE)]><!-->
<html class="" lang="en">
 <!--<![endif]-->
 <head>
  <meta charset="utf-8"/>
  <meta content="a717c29f665e87e5e215d3babc53acec" name="p:domain_verify">
   <meta content="
  Astrology Charts for . Explore the astrological birth charts of famous people and celebrities with an advanced search of planets, aspects, and patterns to deepen your study of astrology. 
" name="description"/>
   <meta content="True" name="HandheldFriendly"/>
   <meta content="width=device-width, initial-scale=1" name="viewport"/>
   <meta content="all" name="robots"/>
   <meta content="text/html; charset=utf-8" http-equiv="content-type">
    <title>
     Top Celebrities with Sun in Aries
    </title>
    <link href="/client/common/css/base.min.e22ca37076b3.css" rel="stylesheet" type="text/css"/>
    <link href="/client/common/images/site/apple-touch-icon.295a64db8994.png" rel="apple-touch-icon"/>
    <link href="/cl

In [59]:
match = soup.find('div', class_="celeb-info").p
print(match)

<p>Aries</p>


In [64]:


match = soup.find('div', class_="celeb-info")
print(match)

match.find_all('p')[1].text

<div class="celeb-info">
<p>Aries</p>
<p>Matthew Healy</p>
<p>Apr 08, 1989</p>
</div>


'Matthew Healy'

In [65]:
for match in soup.find_all('div', class_="celeb-info"):
    name = match.find_all('p')[1].text
    print(name)

Matthew Healy
Lady Gaga
Conan O'Brien
Quavo
Mariah Carey
Kourtney Kardashian
Victoria Beckham
Sarah Jessica Parker
Anya Taylor-Joy
Selena
Kristen Stewart
PinkPantheress
Heath Ledger
Emma Watson
Amanda Bynes
Pedro Pascal
Anitta (singer)
Halle Bailey
Celine Dion
Jackson Wang
Marlon Brando
Robert Downey, Jr.
Logan Paul
Leighton Meester
James Franco
Hayden Christensen
Kate Hudson
Quentin Tarantino
Jack Antonoff
Vincent van Gogh
Jennifer Garner
Lalisa Manoban
Big Sean
Reese Witherspoon
Miranda Kerr
Elle Fanning
Summer Walker
Rosie Huntington-Whiteley
Paul Rudd
Keira Knightley
Elton John
Diana Ross
Lisa Manoban
Leonardo da Vinci
Chester Bennington
Martin Lawrence
Maynard James Keenan
Jessica Chastain
Eddie Murphy
Maya Angelou
Victoria Pedretti
Aretha Franklin
Lil Nas X
Damon Albarn
Hugh Hefner
Suge Knight
Danielle Bregoli
Pharrell Williams
Sadie Sink
Seth Rogen
Shay Mitchell
Sarah Michelle Gellar
Jill Scott
Lily James
Katie Maloney
Jayne Mansfield
Saoirse Ronan
Marvin Gaye
Joan Crawford
Kyri

In [70]:
import pandas as pd

astrological_signs = [
    "Aries",
    "Taurus",
    "Gemini",
    "Cancer",
    "Leo",
    "Virgo",
    "Libra",
    "Scorpio",
    "Sagittarius",
    "Capricorn",
    "Aquarius",
    "Pisces"
]

# List to store data
data = []

for sign in astrological_signs:
    source = requests.get(f"https://astro-charts.com/persons/top/{sign.lower()}/").text
    for match in soup.find_all('div', class_="celeb-info"):
        name = match.find_all('p')[1].text
        data.append({"Name": name, "Sign": sign})

        # Create DataFrame
df = pd.DataFrame(data)

# Display DataFrame
print(df.head())

# Save DataFrame to a CSV file
# df.to_csv('astrological_signs.csv', index=False)

            Name   Sign
0  Matthew Healy  Aries
1      Lady Gaga  Aries
2  Conan O'Brien  Aries
3          Quavo  Aries
4   Mariah Carey  Aries


In [82]:
# Initialize the new column with default values (e.g., None)
df['Predicted'] = None

# Loop through the DataFrame using basic indexing
for i in range(5):
    name = df.iloc[i, 0]  # Get the name from the first column
    predicted_result = predicted_astro_sign(name)
    df.iloc[i, 2] = predicted_result  # Set the predicted result in the third column

# Display the updated DataFrame
print(df)

                   Name    Sign Predicted
0         Matthew Healy   Aries     Libra
1             Lady Gaga   Aries   Scorpio
2         Conan O'Brien   Aries    Gemini
3                 Quavo   Aries       Leo
4          Mariah Carey   Aries       Leo
...                 ...     ...       ...
1195      Ewan McGregor  Pisces      None
1196      Ty Dolla Sign  Pisces      None
1197        Gary Oldman  Pisces      None
1198       Julia Stiles  Pisces      None
1199  Matthew Broderick  Pisces      None

[1200 rows x 3 columns]


# Glove embeddings



In [84]:
import numpy as np
import pandas as pd
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample text data and labels
texts = ["This is a positive text", "This is a negative text"]
labels = [1, 0]  # 1 for positive, 0 for negative

In [86]:
# Load GloVe embeddings (e.g., 100-dimensional vectors)
embedding_index = {}
with open('glove.6B.100d.txt', 'r', encoding='utf-8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embedding_index[word] = coefs

In [87]:
# Tokenize the text data
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
word_index = tokenizer.word_index
padded_sequences = pad_sequences(sequences, padding='post')

# Create an embedding matrix
embedding_dim = 100
embedding_matrix = np.zeros((len(word_index) + 1, embedding_dim))
for word, i in word_index.items():
    embedding_vector = embedding_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector

In [88]:
# Convert texts to padded sequences
padded_sequences = pad_sequences(sequences, padding='post')

In [89]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

# Build the model
model = Sequential([
    Embedding(input_dim=len(word_index) + 1,
              output_dim=embedding_dim,
              weights=[embedding_matrix],
              input_length=padded_sequences.shape[1],
              trainable=False),
    LSTM(64),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 5, 100)            700       
                                                                 
 lstm (LSTM)                 (None, 64)                42240     
                                                                 
 dense (Dense)               (None, 1)                 65        
                                                                 
Total params: 43005 (167.99 KB)
Trainable params: 42305 (165.25 KB)
Non-trainable params: 700 (2.73 KB)
_________________________________________________________________


In [90]:
# Convert labels to numpy array
labels = np.array(labels)

# Train the model
history = model.fit(padded_sequences, labels, epochs=5, validation_data=(padded_sequences, labels))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [91]:
# New texts for prediction
new_texts = ["This text is very positive", "This text is very negative"]
new_sequences = tokenizer.texts_to_sequences(new_texts)
new_padded_sequences = pad_sequences(new_sequences, padding='post', maxlen=padded_sequences.shape[1])

# Predict
predictions = model.predict(new_padded_sequences)

for text, prediction in zip(new_texts, predictions):
    print(f"Text: '{text}' - Prediction: {prediction[0]}")

Text: 'This text is very positive' - Prediction: 0.5329461097717285
Text: 'This text is very negative' - Prediction: 0.4332428574562073
