# Predicting astrological signs of celebrities with few-shot-prompting in a local LLM

## Description

This research aims to investigate the potential validity of astrology by predicting the zodiac signs of various celebrities based on their characteristics. For this, I use a language model to generate detailed descriptions of celebrities’ traits, such as motivations, values, behaviors, and their healthy and unhealthy sides. By processing this information, the I attempt to assign the correct zodiac sign to each celebrity. The accuracy of these predictions will be evaluated to determine whether there is any non-random association between the generated traits and the assigned zodiac signs.

# Plan


*1.	Introduction to Prompt Engineering for character description and zodiac sign analysis*
- Develop effective prompts to ensure the language model outputs relevant and unbiased descriptions of celebrities’ traits.
- Avoid references to specific actions, achievements, or the name of the person to maintain the blindfolded nature of the analysis.
- Conclude the zodiac sign of a character description

* 2. Data Retrieval and Preprocessing*

- Co2. Data Retrieval and Preprocessing*llect the astrological data of a few hundred celebrities by scraping an astrological website.

*3. Data Analysis*
- Use the Mistral-7B language model to generate concise and accurate descriptions of these characteristics.
- Implement algorithms to assign astrological signs based on the generalized characteristics.
- Compare the predicted zodiac signs with the actual signs to evaluate the accuracy of the predictions.

*4. Optimization and Improvement*
- Refine the methods and prompts to improve prediction accuracy and give the analysis a second and potentially more effective try.

*5.	Accuracy Evaluation and Analysis*
- Calculate the accuracy of the predictions to determine if the results are significantly better than random chance (greater than 1/12 or approximately 8.3%).
- Analyze the findings to see if there is any evidence to suggest that astrology might have some validity in determining personality traits.

*6.	Results and Conclusions*
- Summarize the findings and discuss the implications of the results.
- Address the limitations of the study and suggest potential areas for future research.

# 1. Introduction to Prompt Engineering for character description and zodiac sign analysis*


For the extracting of personal traits of celebrities, I could have gone through the wikipedia pages of the celebrities of the dataset. Unfortunately, wikipedia is mostly focussed on achievements. For astrology, we need personality traits of celebrities and they are more randomly scattered throughout the whole web in blogposts, reviews and forums. Since the language models that are now freely available on the market have been trained on billions of internet pages, including all these sources that describe the characteristics of celebrities, the information that will come out of a large language model is probably very accurate. With right prompting these models can used be tweaked to represent even more the characteristics of the celebrity rather than the activities and achievements that he or she has made. We will test this out soon.

The first step of building this code is by making some effective prompts to receive the output from the language model that we want. As a language model, I used Mistral-7B. This is a relatively small model of around 4 GB, run locally, that is allegedly better than the Facebook-made Llama model of even a bigger amount of training parameters. Since it is quite expensive to run a language model for hours, not only because you pay for the usage of the language model, but also for the GPU that is needed for it, this model was attractive since it could just run on my M2 chip of my MacBook and it still was generating the information that was needed to make this analysis.

Let's start with writing the first method that takes a prompt as input and gives a generated text as output. This will be needed to generate the text we want for the analysis. We import a few classes from the  Langchain Library for this and make connection with a local Mistral language model. One thing about the Langchain library is that it uses some specific method of combining multiple elements that are needed to create the output, called chaining, where in an intuitive way the different elements are just put together on one coding line with a vertical dash between them. 


In [2]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.llms import Ollama

def local_mistral(question):
    """
    Generates a response to a given question using the Mistral model via LangChain.

    Args:
        question (str): The question to be answered by the model.

    Returns:
        str: The response generated by the model.
    """
    # Initialize the LLM model "mistral"
    llm = Ollama(model="mistral")
    
    # Create a prompt template with predefined system and user messages
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a proficient and knowledgeable person"),  # System message for setting context
        ("user", "{input}") 
    ])
    
    # Initialize the output parser to handle the model's response
    output_parser = StrOutputParser()
    
    # Chain the prompt, model, and output parser together
    chain = prompt | llm | output_parser

    # Invoke the chain with the input question and return the response
    return chain.invoke({"input": f"{question}"})


# Example usage of the function
response = local_mistral("Do you think astrology is BS?")
print(response)

 I am a model and do not hold personal beliefs. However, it's important to note that while astrology has been practiced for thousands of years across many cultures, there is no scientific evidence that supports the claim that the positions of celestial bodies influence human affairs or personality traits. Some people find comfort or meaning in their horoscopes, and that's perfectly fine, as long as they understand it's not based on empirical fact.


This works well. Now, we can already try to generate the information we want to have about the celebrities. First it deserves some thought to see what information we want. Astrology is about the inside of a person, so even though the person is a celebrity which must be influenced by the qualities of the person, it is still about the qualities and tendencies and not about achievements, professions, and things like that. Here comes the first difficulty of our approach and that is that if the web is stuffed with information that is mostly about actions and achievements and less about the internal qualities and deficiencies of the person described. A prompt should give an answer that in some way reflects the internal state of the celebrities that we try to analyze.

In [3]:
def characteristics_of(person):
    """
    Generates a detailed analysis of a person's characteristics, including motivations,
    values, behaviors, and their healthy and unhealthy sides.

    Args:
        person (str): The name of the person whose characteristics are to be analyzed.

    Returns:
        str: The detailed analysis of the person's characteristics.
    """
    # Create a prompt string with the specified person's name
    prompt = f"""What are the motivations, values, and 
            behaviors of {person}. What are her/his healthy and unhealthy sides?
            """
    
    # Use the local_mistral function to get the analysis from the model
    answer = local_mistral(prompt)

    # Return the generated answer
    return answer


# Example usage of the function
person = "Mark Rutte"
mark_example = characteristics_of(person)
print(mark_example)

 Mark Rutte is a Dutch politician who has served as Prime Minister of the Netherlands since October 2010 and Leader of the People's Party for Freedom and Democracy (VVD) since February 2006. Here, I will discuss his motivations, values, behaviors, and both positive and negative aspects of him based on publicly available information:

Motivations:
- Public service: Rutte has expressed a strong commitment to serving the Dutch people, focusing on issues like economic growth, fiscal responsibility, and social cohesion.
- Political career advancement: Like many politicians, Rutte is motivated by the desire to achieve higher positions within his political party and government.
- Policy implementation: As a policy wonk, Rutte values the successful implementation of his political ideologies and agendas.

Values:
- Liberal democracy: Rutte believes in upholding liberal democratic values such as individual freedom, rule of law, and a strong economy.
- Fiscal responsibility: He places great impor

This all looks quite accurate. If we run this code on Mark Rutte, it reflects the qualities that he has showcased while being the Prime Minister of our country, and it shows at the same time why he became a disputed person.


There are a few things to consider. When we ask a language model explicitly about the characteristics of person X, unavoidably the name of X will be used in the answer. For example, when asking about Mark Rutte, the answer will be of the kind "*Mark Rutte* was the prime minister of the Netherlands". Next to that, activities and achievements that are exemplary and can be used to easily trace down the output to the inputted person "... he is the only leader of a country that goes to his work by bike..." . This will not be avoided by just deleting the name from the output before giving it to the new method. 
This will be bad for the purpose of this code since it might bias the outcome by already providing the right person, and who knows, the person's astrological sign in the output since there are many internet articles making astrological analysis about celebrities. Ideally, the code is "blindfoldeded" and makes it's guesses while not seeing the person. 

The next step will be to have this set of qualities as described in the first prompt but to have them unrelated to the person and unrelated to its specific actions and achievements that could possibly help to trace the characteristics back to the specific person used as the input.

We have to be quite clear to the language model so that it does what we want it to do. For this, we use the previous answer as context for the next text generation and so will be generated in the way that we want it to be generated. It also has the person as an argument because we want to explicitly refer to the person and say to the language model that this name should not be used in any way in the answer that is going to be generated.

In [4]:
def unpersonal_characteristics(characteristics, person):
    """
    Generates a general overview of characteristics based on given positive and negative traits,
    while making no reference to the specified person.

    Args:
        characteristics (str): The positive and negative traits to be summarized.
        person (str): The name of the person to be excluded from the summary.

    Returns:
        str: A general overview of characteristics without referencing the person.
    """
    # Create a prompt string with the specified characteristics and person name
    prompt = f"""
    positive and negative traits: "{characteristics}"

    Based on these positive and negative traits, make
    a general overview of characteristics while making no reference to {person}. Just summarize the characteristics 
    of the person. So don't mention in any way examples of his/her 
    behavior. Keep it short.
    """
    
    # Use the local_mistral function to get the summary from the model
    answer = local_mistral(prompt)

    # Return the generated summary
    return answer


unpersonal_mark = unpersonal_characteristics(mark_example, person)
print(unpersonal_mark)

 This individual is a dedicated political figure with a strong commitment to public service, focusing on economic growth, fiscal responsibility, and social cohesion. Their motivations extend beyond just their career, encompassing the desire for policy implementation. They value liberal democracy, fiscal responsibility, and social harmony in their society. Known for maintaining a calm demeanor and pragmatic decision-making, they demonstrate effective negotiation skills both domestically and internationally.

On the positive side, this person brings stability to their government, fosters economic growth, and utilizes diplomatic skills effectively. However, they may lack charisma when compared to other political leaders, which could impact public support or inspiration. Some critics argue that they might be perceived as inflexible in their beliefs and struggle with compromising on key issues. Additionally, there is a suggestion that they may have limited political vision, focusing primari

Now we can already make the language model to assign an astrology sign based on the text with characteristics generated in the previous method. This we can do from the assumption that there is a lot of information on the internet about astrology so we can assume that when asking the model to assign an astrological sign, it will be the right one. To just show it for you to be sure here is a list with characteristics of certain astrological sign Pisces and then of astrological sign Aquarius and we quickly ask the model what sign it thinks it is.

In [32]:
def assign_zodiac_to(traits):
	"""
    Assigns an astrological sign to a person based on their traits.

    Args:
        traits (str): The traits of the person.

    Returns:
        str: The predicted astrological sign.
    """

	prompt = f""" traits: "{traits}""
            question: "What could be the astrology sign of this person based on these traits?"
            answer: [just answer with one word, for example: "Pisces", "Virgo", not two!]"""
	
	answer = local_mistral(prompt)

	return answer


# Pisces example
pisces = """Positive Characteristics:

	•	compassionate
	•	charitable
	•	sympathetic
	•	emotional
	•	sacrificing
	•	intuitive
	•	introspective
	•	musical
	•	artistic

Negative Characteristics:

	•	procrastinating
	•	over-talkative
	•	melancholy
	•	pessimistic
	•	emotionally inhibited
	•	timid
	•	impractical
	•	indolent
	•	often feels misunderstood"""

aqarius = """Positive Characteristics:

	•	independent
	•	inventive
	•	tolerant
	•	individualistic
	•	progressive
	•	artistic
	•	scientific
	•	logical
	•	humane
	•	intellectual
	•	altruistic

Negative Characteristics:

	•	unpredictable
	•	temperamental
	•	bored by detail
	•	cold
	•	too fixed in opinions
	•	shy
	•	eccentric
	•	radical
	•	impersonal
	•	rebellious
"""

print(assign_zodiac_to(pisces)) # Pisces
print(assign_zodiac_to(aqarius)) # Sometimes Libra, sometimes Gemini, sometimes Aquarius

 Pisces
 Libra


These outcomes are kind of right. When running this cell, this first sign is always taken to be Pisces, which is right. The second is sometimes taken to be Gemini, and sometimes Acquarius, approximately half of the time. The right answer is Acquarius. Assigning signs in this way has because of that an rough accuracy score of 75%.

The methods combined together.

In [6]:
def predicted_astro_sign(person):
    """
    Predicts the astrological sign of a person based on their characteristics.

    Args:
        person (str): The person whose astrological sign needs to be predicted.

    Returns:
        tuple: A tuple containing:
            - unpersonal_traits (str): The transformed unpersonal traits.
            - astro_sign (str): The predicted astrological sign.
    """
    
    # generate characteristics of the person
    characteristics = characteristics_of(person)
    
    # transform the characteristics to unpersonal traits
    unpersonal_traits = unpersonal_characteristics(characteristics, person)
    
    # draw a zodiac sign based on the traits
    astro_sign = assign_zodiac_to(unpersonal_traits)
    
    return (unpersonal_traits, astro_sign)

# 2. Data Retrieval and Preprocessing

## Scraping Astro websites

AstroChart.com offers a comprehensive overview of the astrological signs associated with various celebrities, providing detailed information on their sun signs and ascendants. With relatively little effor they can be put in a csv file.

In [7]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

# List of astrological signs to put in the URL
astrological_signs = [
    "Aries",
    "Taurus",
    "Gemini",
    "Cancer",
    "Leo",
    "Virgo",
    "Libra",
    "Scorpio",
    "Sagittarius",
    "Capricorn",
    "Aquarius",
    "Pisces"
]

# List to store data
data = []


for sign in astrological_signs:
    source = requests.get(f"https://astro-charts.com/persons/top/{sign.lower()}/").text
    soup = BeautifulSoup(source, 'html.parser')
    for match in soup.find_all('div', class_="celeb-info"):
        name = match.find_all('p')[1].text
        data.append({"Name": name, "Sign": sign})

# Create DataFrame
sun_signs_celebs = pd.DataFrame(data)
print(len(sun_signs_celebs))

# Display DataFrame
print(sun_signs_celebs.head())



1200
                  Name   Sign
0          Celine Dion  Aries
1            Lady Gaga  Aries
2  Kourtney Kardashian  Aries
3         Mariah Carey  Aries
4     Leighton Meester  Aries


In [8]:
# Saving the DataFrame to a CSV file
csv_file_path = 'sun_signs_celebs.csv'
sun_signs_celebs.to_csv(csv_file_path, index=False)

It is also good to take the ascendant of the celebrity, for the reason that in astrology it is said that the ascendant is more related to how people perceive the persons. This might be relevant  to us in case of the celebrities, since a lot of the information about them is rather how people perceive them to be, than how they are actually in themselves. So, in case a sun sign prediction goes wrong, we can give it another try with ascendant prediction to have better results.

In [9]:
# List to store data
data = []

# loops over the different astrological signs
for sign in astrological_signs:

    # goes to the webpage where this sign is the ascendant
    source = requests.get(f"https://astro-charts.com/persons/planet/Ascendant/sign/{sign.lower()}/").text
    soup = BeautifulSoup(source, 'html.parser')

    # tracks the amount of celebs with this sign as ascendant
    amount_of_celebs_in_page = int(soup.find('div', class_="celeb-cat-info").span.b.text)
    
    # tracks the amount of pages to loop over
    amount_of_pages = (amount_of_celebs_in_page // 30) + 1
    
    # loop over the pages and scrape the names
    for page_number in range(amount_of_pages):
        source = requests.get(f"https://astro-charts.com/persons/planet/Ascendant/sign/{sign.lower()}/page/{page_number}/").text
        soup = BeautifulSoup(source, 'html.parser')

        for name in soup.find_all('div', class_="flex-inner celeb-info"):
            celeb = name.p.text
            data.append({"Name": celeb, "Ascendant": sign})


ascendant_df = pd.DataFrame(data)
print(len(ascendant_df))

# Display DataFrame
print(ascendant_df.head())



3990
                  Name Ascendant
0         Heath Ledger     Aries
1       Sofia Boutella     Aries
2  Bryce Dallas Howard     Aries
3          Brie Larson     Aries
4     Natalie Martinez     Aries


In [10]:
# Saving the DataFrame to a CSV file
csv_file_path = 'ascendant_celebs.csv'
ascendant_df.to_csv(csv_file_path, index=False)

To make working with the data more easy, it is good to put all the extracted data in the same CSV file.

In [11]:
import pandas as pd

# Merge the sun sign dataframe with the ascendant dataframe

# Load the first CSV file as df1
df1_path = 'sun_signs_celebs.csv'
sun_signs_celebs = pd.read_csv(df1_path)

# Load the second CSV file as df2
df2_path = 'ascendant_celebs.csv'
ascendant_df = pd.read_csv(df2_path)

# Try to merge the other way around
df_merged_from_ascendant = ascendant_df.merge(sun_signs_celebs, on='Name', how='left')
print(df_merged_from_ascendant.head())

# Remove rows with NaN values
df_without_nan = df_merged_from_ascendant.dropna(subset=['Ascendant', 'Sign'])

# Group by 'Sign' and limit to 50 names per sign
fifty_celebs_per_sign = df_without_nan.groupby('Sign').head(50)
print(len(fifty_celebs_per_sign)) # 555
print(fifty_celebs_per_sign.head())

# Reorder the columns to have 'Sign' first and 'Ascendant' second
ordered_df = fifty_celebs_per_sign[['Name', 'Sign', 'Ascendant']]

# Reorder the DataFrame by the 'Sign' column and then adjust the column order
final_df_sorted = ordered_df.sort_values(by='Sign')
print(final_df_sorted.head())

                  Name Ascendant    Sign
0         Heath Ledger     Aries   Aries
1       Sofia Boutella     Aries     NaN
2  Bryce Dallas Howard     Aries  Pisces
3          Brie Larson     Aries   Libra
4     Natalie Martinez     Aries     NaN
573
                   Name Ascendant     Sign
0          Heath Ledger     Aries    Aries
2   Bryce Dallas Howard     Aries   Pisces
3           Brie Larson     Aries    Libra
9        Kendall Jenner     Aries  Scorpio
10     Barbra Streisand     Aries   Taurus
                Name      Sign Ascendant
520   Virginia Woolf  Aquarius    Gemini
835   Farrah Fawcett  Aquarius    Cancer
903      Sharon Tate  Aquarius    Cancer
948       The Weeknd  Aquarius    Cancer
2217    Adam Lambert  Aquarius     Libra


In [12]:
# Saving the DataFrame to a CSV file
csv_file_path = 'final_df_sorted.csv'
final_df_sorted.to_csv(csv_file_path, index=False)

# 3. Data Analysis

Now we have all the ingredients to combine them together into the first analysis.

In [33]:
import time

# Initialize the new column with default values (e.g., None)
final_df_sorted['Unpersonalized description'] = None
final_df_sorted['Predicted'] = None


length = len(final_df_sorted)

def format_time(seconds):
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    return f"{hours}h {minutes}m {seconds}s"

# Loop through the DataFrame using basic indexing
start_time = time.time()
time_stamps = []
for i in range(length):
    start_generating = time.time()
    name = final_df_sorted.iloc[i, 0]  # Get the name from the first column

    # track which number of generations we are
    print(f"Progress: {i}/{length}")

    # predict the astrology sign
    unpersonal_description, predicted_result = predicted_astro_sign(name)

    # save elapsed time 
    elapsed_time = time.time() - start_time

    # save the time that the generating took
    time_after_generating = time.time() - start_generating

    # add generating time to a list of generating times
    time_stamps.append(time_after_generating)

    # calculate average generating time
    average_generating_time = sum(time_stamps) / len(time_stamps)

    # generations to go
    to_go = length - i

    # expected time
    expected_time = format_time(to_go * average_generating_time)

    # print expected time
    print(f"the average generating time is {average_generating_time}, we have {to_go} more to go, so the expected time is {expected_time}")

    final_df_sorted.iloc[i, 2] = unpersonal_description  # Set the unpersonal traits in the third column
final_df_sorted

# Display the updated DataFrame
print(final_df_sorted.head())

Progress: 0/573


KeyboardInterrupt: 

```
Progress: 0/600
 Leo
the average generating time is 47.4957230091095, we have 600 more to go, so the expected time is 7h 54m 57s
Progress: 1/600
 Scorpio
the average generating time is 49.36778545379639, we have 599 more to go, so the expected time is 8h 12m 51s
Progress: 2/600
 Leo
the average generating time is 51.634344021479286, we have 598 more to go, so the expected time is 8h 34m 37s
Progress: 3/600
 Leo
the average generating time is 48.35454374551773, we have 597 more to go, so the expected time is 8h 1m 7s
Progress: 4/600
 Aries
the average generating time is 46.73957195281982, we have 596 more to go, so the expected time is 7h 44m 16s
Progress: 5/600
 Aries
the average generating time is 46.94747813542684, we have 595 more to go, so the expected time is 7h 45m 33s
Progress: 6/600
 Aries
the average generating time is 48.329658303942, we have 594 more to go, so the expected time is 7h 58m 27s
Progress: 7/600
 Aries
the average generating time is 50.37543475627899, we have 593 more to go, so the expected time is 8h 17m 52s
Progress: 8/600
...
Progress: 138/600
 Aries
the average generating time is 53.421504548985325, we have 462 more to go, so the expected time is 6h 51m 20s
Progress: 139/600
```

While running the code cell above, I was curious how the dataframe looked. So I tried to print the dataframe but while a process was happening on it. And then of course that cell didn't go. Then I stopped that cell that was about to print the dataframe, but then the data frame that was in process was stopped too. That was unfortunate but at the same time a nice reason for me to save the data so far to a csv file and see what the predicted and accurate scores were. First, the predicted scores of the data that covers the Aries sign. That seemed to be quite successful because Aries was listed on number two with 12 from the 50 predicted Aries. 

In [None]:
# Saving the DataFrame to a CSV file
csv_file_path = 'first_sample_data.csv'
final_df_sorted.to_csv(csv_file_path, index=False)

In [None]:
df_aries = final_df_sorted[:50]
predicted_counts = df_aries['Predicted'].value_counts()
predicted_counts

'''
Predicted
Leo          19
Aries        12
Capricorn     8
Scorpio       5
Pisces        3
Cancer        2
Libra         1
Name: count, dtype: int64
'''

But when I did it for the second batch which was covering the astrological sign Taurus, the score was actually very low, namely zero. And again Leo and Aries were on top. This probably has to do with that of all the texts, the celebreties seem very assertive people. Then it is quite logical that Aries and Leo come to the foreground because these are also very assertive signs in astrology.

In [None]:
df_taurus = final_df_sorted[50:100]
predicted_counts = df_taurus['Predicted'].value_counts()
predicted_counts

'''Predicted
Leo          21
Aries        14
Scorpio       6
Capricorn     5
Libra         1
Cancer        1
Taurus        1
Pisces        1
Name: count, dtype: int64'''

We don't even have the accuracy score here, but it seems from the data that clearly it will not go below, it won't go above 0.08.

In [None]:
# Loading the CSV file back into a DataFrame
loaded_df = pd.read_csv(csv_file_path)
gemini_df = loaded_df[100:140]
predicted_counts = gemini_df['Predicted'].value_counts()
predicted_counts

'''Predicted
Leo          17
Aries        12
Capricorn     5
Scorpio       3
Libra         2
Name: count, dtype: int64'''

And for the last sign that was not fully completed yet in the process, also Leo and Aguirre are topping. So I should put it over a different route to, if I want to study it in a good way, I would need to adjust the text that comes out. I would actually just need to generate two lists with positive and negative characteristics. And then use a language model combined with a vector base to make use of astrological knowledge to classify the list of characteristics.

# 4. Optimization and Improvement

### Assigning positive and negative characteristics with Few Shot Prompting

In [14]:
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

# Initialize the LLM model "mistral"
llm = Ollama(model="mistral")


def generate_characteristics(person):

    examples = [
        {
            "human": "Lady Gaga",
            "ai": """
    Positive Characteristics:

        •	Creative
        •	Charismatic
        •	Passionate
        •	Empathetic
        •	Strong-willed
        •	Innovative
        •	Confident
        •	Inspirational

    Negative Characteristics:

        •	Impulsive
        •	Over-sensitive
        •	Stubborn
        •	Eccentric
        •	Intense
        •	Perfectionist
        •	Controversial
        •	Melancholic
    """,
        },
        {
            "human": "Paul McCartney",
            "ai": """
    Positive Characteristics:
    - Creative
    - Charismatic
    - Talented
    - Versatile
    - Inspirational
    - Compassionate
    - Humble
    - Dedicated

    Negative Characteristics:
    - Perfectionist
    - Stubborn
    - Reserved
    - Private
    - Nostalgic
    - Introverted
    - Emotional
    - Melancholic
    """,
        },
        {
            "human": "Donald Trump",
            "ai": """
    ### Positive Characteristics:
    - Confident
    - Charismatic
    - Ambitious
    - Determined
    - Persuasive
    - Bold
    - Resilient
    - Strategic

    ### Negative Characteristics:
    - Controversial
    - Impulsive
    - Stubborn
    - Egotistical
    - Blunt
    - Divisive
    - Arrogant
    - Combative
    """,
        },
        {
            "human": "Dick Schoof",
            "ai": """
    ### Positive Characteristics:
    - Analytical
    - Disciplined
    - Responsible
    - Strategic
    - Resilient
    - Diligent
    - Calm
    - Methodical

    ### Negative Characteristics:
    - Reserved
    - Strict
    - Cautious
    - Stern
    - Unyielding
    - Critical
    - Detached
    - Perfectionist
    """,
    },
        {
            "human": "Yanis Varoufakis",
            "ai": """
    ### Positive Characteristics:
    - Intelligent
    - Charismatic
    - Articulate
    - Bold
    - Innovative
    - Strategic
    - Passionate
    - Insightful

    ### Negative Characteristics:
    - Controversial
    - Stubborn
    - Eccentric
    - Confrontational
    - Outspoken
    - Radical
    - Uncompromising
    - Impulsive
    """,
        },
    ]



    # This is a prompt template used to format each individual example.
    example_prompt = ChatPromptTemplate.from_messages(
        [
            ("human", "{human}"),
            ("ai", "{ai}"),
        ]
    )
    few_shot_prompt = FewShotChatMessagePromptTemplate(
        example_prompt=example_prompt,
        examples=examples,
    )

    final_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", """You are a good analyst of positive and negative characteristics. 
             """),
            few_shot_prompt,
            ("human", "{human}"),
        ]
    )


    chain = final_prompt | llm

    return chain.invoke({"human": f"{person}"})

person = "Vincent van Gogh"
response = generate_characteristics(person)
response



" AI:\n    ### Positive Characteristics:\n    - Creative\n    - Passionate\n    - Talented\n    - Expressive\n    - Emotional\n    - Sensitive\n    - Innovative\n    - Dedicated\n\n    ### Negative Characteristics:\n    - Melancholic\n    - Impulsive\n    - Erratic\n    - Isolated\n    - Overworked\n    - Struggling with mental health issues\n    - Frustrated with lack of success during his lifetime\n    - Unstable relationships\n     The above analysis focuses on the public persona and general character traits based on information available about each individual. It's essential to remember that these are simplified summaries, and people are complex with a myriad of attributes."

Then the name should be replaced, because otherwise the model might link the name to the astrological sign that is already known to belong to the specific celebrity.

In [16]:
def replace_name(text, person, new_name="Person X"):
    """
    This function replaces all instances of old_name with new_name in the provided text.

    Parameters:
    text (str): The text in which to replace the name.
    person (str): The name to be replaced. 
    new_name (str): The name to replace with. Default is "Person X".

    Returns:
    str: The text with the name replaced.
    """
    return text.replace(person, new_name)

response = replace_name(response, person)
response


" AI:\n    ### Positive Characteristics:\n    - Creative\n    - Passionate\n    - Talented\n    - Expressive\n    - Emotional\n    - Sensitive\n    - Innovative\n    - Dedicated\n\n    ### Negative Characteristics:\n    - Melancholic\n    - Impulsive\n    - Erratic\n    - Isolated\n    - Overworked\n    - Struggling with mental health issues\n    - Frustrated with lack of success during his lifetime\n    - Unstable relationships\n     The above analysis focuses on the public persona and general character traits based on information available about each individual. It's essential to remember that these are simplified summaries, and people are complex with a myriad of attributes."

For the second analysis let's not depend too much on the astrological knowledge that is available on the web, but try to feed the large language model with examples in the prompt that might give a stronger pull to the right astrological analysis by putting all the characteristics belonging to each specific sign in the example list that is inputted in each prompt that is given to the language model.

In [17]:
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

def characteristics_to_zodiac(characteristics):

    examples = [
        {
            "human": """
    **Positive characteristics:**
    - pioneering
    - executive
    - competitive
    - impulsive
    - eager
    - courageous
    - independent
    - dynamic
    - lives in present
    - quick

    **Negative characteristics:**
    - domineering
    - quick-tempered
    - violent
    - intolerant
    - hasty
    - arrogant
    - "me first"
    - brusque
    - lacks follow-through""",
            "ai": "Aries",
        },
        {
            "human": """

    **Positive characteristics:**
    - patient
    - conservative
    - domestic
    - sensual
    - thorough
    - stable
    - dependable
    - practical
    - artistic
    - loyal

    **Negative characteristics:**
    - self-indulgent
    - stubborn
    - slow-moving
    - argumentative
    - short-tempered
    - possessive
    - greedy
    - materialistic""",
            "ai": "Taurus",
        },
        {
            "human": """

    **Positive characteristics:**
    - dual
    - congenial
    - curious
    - adaptable
    - expressive
    - quick-witted
    - literary
    - inventive
    - dextrous
    - clever

    **Negative characteristics:**
    - changeable
    - ungrateful
    - scatterbrained
    - restless
    - scheming
    - lacking in concentration
    - lacking in follow-through""",
            "ai": "Gemini",
        },
        {
            "human": """

    **Positive characteristics:**
    - tenacious
    - intuitive
    - maternal
    - domestic
    - sensitive
    - retentive
    - helpful
    - sympathetic
    - emotional
    - patriotic
    - good memory
    - traditional

    **Negative characteristics:**
    - brooding
    - touchy
    - too easily hurt
    - negative
    - manipulative
    - too cautious
    - lazy
    - selfish
    - sorry for self""",
            "ai": "Cancer",
        },
        {
            "human": """

    **Positive characteristics:**
    - dramatic
    - idealistic
    - proud
    - ambitious
    - creative
    - dignified
    - romantic
    - generous
    - self-assured
    - optimistic

    **Negative characteristics:**
    - vain
    - status conscious
    - childish
    - overbearing
    - fears ridicule
    - cruel
    - boastful
    - pretentious
    - autocratic""",
            "ai": "Leo",
        },
        {
            "human": """

    **Positive characteristics:**
    - industrious
    - studious
    - scientific
    - methodical
    - discriminating
    - fact-finding
    - exacting
    - clean
    - humane
    - seeks perfection

    **Negative characteristics:**
    - critical
    - petty
    - melancholy
    - self-centered
    - fears disease and poverty
    - picky
    - pedantic
    - skeptical
    - sloppy""",
            "ai": "Virgo",
        },
        {
            "human": """

    **Positive characteristics:**
    - cooperative
    - persuasive
    - companionable
    - peace-loving
    - refined
    - judicial
    - artistic
    - diplomatic
    - sociable
    - suave

    **Negative characteristics:**
    - fickle
    - apathetic
    - loves intrigue
    - peace at any price
    - pouting
    - indecisive
    - easily deterred""",
            "ai": "Libra",
        },
        {
            "human": """

    **Positive characteristics:**
    - motivated
    - penetrating
    - executive
    - resourceful
    - determined
    - scientific
    - investigative
    - probing
    - passionate
    - aware

    **Negative characteristics:**
    - vengeful
    - temperamental
    - secretive
    - overbearing
    - violent
    - sarcastic
    - suspicious
    - jealous
    - intolerant""",
            "ai": "Scorpio",
        },
        {
            "human": """

    **Positive characteristics:**
    - straightforward
    - philosophical
    - freedom-loving
    - broadminded
    - athletic
    - generous
    - optimistic
    - just
    - religious
    - scholarly
    - enthusiastic

    **Negative characteristics:**
    - argumentative
    - exaggerative
    - talkative
    - procrastinating
    - self-indulgent
    - blunt
    - impatient
    - a gambler
    - pushy
    - hot-headed""",
            "ai": "Sagittarius",
        },
        {
            "human": """

    **Positive characteristics:**
    - cautious
    - responsible
    - scrupulous
    - conventional
    - businesslike
    - perfectionist
    - traditional
    - practical
    - hardworking
    - economical
    - serious

    **Negative characteristics:**
    - egotistic
    - domineering
    - unforgiving
    - fatalistic
    - the mind rules the heart
    - stubborn
    - brooding
    - inhibited
    - status-seeking""",
            "ai": "Capricorn",
        },
        {
            "human": """

    **Positive characteristics:**
    - independent
    - inventive
    - tolerant
    - individualistic
    - progressive
    - artistic
    - scientific
    - logical
    - humane
    - intellectual
    - altruistic

    **Negative characteristics:**
    - unpredictable
    - temperamental
    - bored by detail
    - cold
    - too fixed in opinions
    - shy
    - eccentric
    - radical
    - impersonal
    - rebellious""",
            "ai": "Aquarius",
        },
        {
            "human": """

    **Positive characteristics:**
    - compassionate
    - charitable
    - sympathetic
    - emotional
    - sacrificing
    - intuitive
    - introspective
    - musical
    - artistic

    **Negative characteristics:**
    - procrastinating
    - over-talkative
    - melancholy
    - pessimistic
    - emotionally inhibited
    - timid
    - impractical
    - indolent
    - often feels misunderstood""",
            "ai": "Pisces",
        },
    ]



    # This is a prompt template used to format each individual example.
    example_prompt = ChatPromptTemplate.from_messages(
        [
            ("human", "{human}"),
            ("ai", "{ai}"),
        ]
    )
    few_shot_prompt = FewShotChatMessagePromptTemplate(
        example_prompt=example_prompt,
        examples=examples,
    )

    final_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a good astrologer. Make sure to answer just in one word."),
            few_shot_prompt,
            ("human", "{human}"),
        ]
    )


    chain = final_prompt | llm

    return chain.invoke({"human": f"{characteristics}"})

response = characteristics_to_zodiac(response)
response

' Scorpio'

So, let's come back to the data frame that we made with the Sun signs and the ascendance, and see if we can again generate predicted astrological signs, but now with the improved methods.

In [18]:
final_df_sorted.head()

Unnamed: 0,Name,Sign,Ascendant
520,Virginia Woolf,Aquarius,Gemini
835,Farrah Fawcett,Aquarius,Cancer
903,Sharon Tate,Aquarius,Cancer
948,The Weeknd,Aquarius,Cancer
2217,Adam Lambert,Aquarius,Libra


In [19]:
import pandas as pd

# Load the final DataFrame from the CSV file
final_df_sorted = pd.read_csv('final_df_sorted.csv')  


In [22]:
import time

# Initialize the new column with default values
final_df_sorted['Characteristics'] = None
final_df_sorted['Predicted'] = None


length = len(final_df_sorted)

def format_time(seconds):
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    return f"{hours}h {minutes}m {seconds}s"

# Loop through the DataFrame using basic indexing
start_time = time.time()
time_stamps = []
for i in range(length):
    start_generating = time.time()
    name = final_df_sorted.iloc[i, 0]  # Get the name from the first column

    # track which number of generations we are
    print(f"Progress: {i}/{length}")

    # generate characteristics
    characteristics = generate_characteristics(name)

    # replace name
    characteristics = replace_name(characteristics, name)

    # generate zodiac sign
    zodiac = characteristics_to_zodiac(characteristics)

    final_df_sorted.iloc[i, 3] = characteristics  # Set the unpersonal traits in the fourth column
    final_df_sorted.iloc[i, 4] = zodiac  # Set the predicted result in the fourth column


    # save elapsed time 
    elapsed_time = time.time() - start_time

    # save the time that the generating took
    time_after_generating = time.time() - start_generating

    # add generating time to a list of generating times
    time_stamps.append(time_after_generating)

    # calculate average generating time
    average_generating_time = sum(time_stamps) / len(time_stamps)

    # generations to go
    to_go = length - i

    # expected time
    expected_time = format_time(to_go * average_generating_time)

    # print expected time
    print(f"the average generating time is {average_generating_time}, we have {to_go} more to go, so the expected time is {expected_time}")

# Display the updated DataFrame
print(final_df_sorted.head())

# Saving the DataFrame to a CSV file
csv_file_path = 'zodiacs_of_celebs.csv'
final_df_sorted.to_csv(csv_file_path, index=False)

Progress: 0/573
the average generating time is 24.988797903060913, we have 573 more to go, so the expected time is 3h 58m 38s
Progress: 1/573
the average generating time is 25.781315088272095, we have 572 more to go, so the expected time is 4h 5m 46s
Progress: 2/573
the average generating time is 25.23803170522054, we have 571 more to go, so the expected time is 4h 0m 10s
Progress: 3/573
the average generating time is 24.75097405910492, we have 570 more to go, so the expected time is 3h 55m 8s
Progress: 4/573
the average generating time is 25.86875185966492, we have 569 more to go, so the expected time is 4h 5m 19s
Progress: 5/573
the average generating time is 27.864726901054382, we have 568 more to go, so the expected time is 4h 23m 47s
Progress: 6/573
the average generating time is 28.41473947252546, we have 567 more to go, so the expected time is 4h 28m 31s
Progress: 7/573
the average generating time is 28.70038577914238, we have 566 more to go, so the expected time is 4h 30m 44s
P

# 5.	Accuracy Evaluation and Analysis

We will do some cleaning before making an analysis about the accuracy

In [28]:
# Define the list of valid astrological signs
valid_signs = [
    'aries', 'taurus', 'gemini', 'cancer', 'leo', 'virgo',
    'libra', 'scorpio', 'sagittarius', 'capricorn', 'aquarius', 'pisces'
]

# Function to clean the signs
def clean_sign(sign):
    sign = sign.lower().strip()
    if sign in valid_signs:
        return sign
    return None

# Apply the cleaning function to the 'Sign' and 'Predicted' columns
final_df_sorted['Sign'] = final_df_sorted['Sign'].apply(clean_sign)
final_df_sorted['Predicted'] = final_df_sorted['Predicted'].apply(clean_sign)
final_df_sorted['Ascendant'] = final_df_sorted['Ascendant'].apply(clean_sign)

# Display the updated DataFrame
print(final_df_sorted.head())


             Name      Sign Ascendant  \
0  Virginia Woolf  aquarius    gemini   
1  Farrah Fawcett  aquarius    cancer   
2     Sharon Tate  aquarius    cancer   
3      The Weeknd  aquarius    cancer   
4    Adam Lambert  aquarius     libra   

                                     Characteristics Predicted  Random_Sign  \
0   AI:\n\n    ### Positive Characteristics:\n   ...   scorpio        libra   
1   AI:\n    ### Positive Characteristics:\n    -...       leo     aquarius   
2   ### Positive Characteristics:\n- Gentle\n- Gr...      None          leo   
3   AI:\n    ### Positive Characteristics:\n    -...   scorpio        libra   
4   AI:\n    ### Positive Characteristics:\n    -...     aries  sagittarius   

  Random_Ascendant  
0      sagittarius  
1           taurus  
2            aries  
3              leo  
4           gemini  


What is the significance of this score? We can only find out by now randomly assigning zodiac signs to the Sun column and the Ascendant column and then look for the accuracy score. We can do that around 10 times and calculate accuracy score based on that. If this is around the same then nothing has changed.

In [31]:
import numpy as np

accuracy_sun_list = []
accuracy_ascendant_list = []

for i in range(len(final_df_sorted)):

    # Randomly assign zodiac signs to 'Sign' and 'Ascendant' columns
    final_df_sorted['Random_Sign'] = np.random.choice(valid_signs, size=len(final_df_sorted))
    final_df_sorted['Random_Ascendant'] = np.random.choice(valid_signs, size=len(final_df_sorted))
    
    # create a list of accuracy predictions in relation to the sun sign
    print(final_df_sorted['Random_Sign'])
    print(final_df_sorted['Predicted'])
    accuracy_sun_sign = final_df_sorted['Random_Sign'] == final_df_sorted['Predicted']
    accuracy_score_sun = sum(accuracy_sun_sign) / len(final_df_sorted)
    accuracy_sun_list.append(accuracy_score_sun)

    # create a list of accuracy predictions in relation to the ascendant sign
    accuracy_ascendant = final_df_sorted['Random_Ascendant'] == final_df_sorted['Predicted']
    accuracy_score_ascendant = sum(accuracy_ascendant) / len(final_df_sorted)
    accuracy_ascendant_list.append(accuracy_score_ascendant)

print(f'the accuracy score for sun signs with randomly distributed signs is {sum(accuracy_sun_list)/ len(accuracy_sun_list)}')
print(f'the accuracy score for ascendant signs with randomly distributed signs is {sum(accuracy_ascendant_list)/ len(accuracy_ascendant_list)}')



0           taurus
1          scorpio
2      sagittarius
3        capricorn
4           taurus
          ...     
568       aquarius
569          aries
570         cancer
571         cancer
572       aquarius
Name: Random_Sign, Length: 573, dtype: object
0      scorpio
1          leo
2         None
3      scorpio
4        aries
        ...   
568      aries
569        leo
570      aries
571      aries
572       None
Name: Predicted, Length: 573, dtype: object
0           gemini
1      sagittarius
2           cancer
3           taurus
4        capricorn
          ...     
568         taurus
569            leo
570        scorpio
571      capricorn
572      capricorn
Name: Random_Sign, Length: 573, dtype: object
0      scorpio
1          leo
2         None
3      scorpio
4        aries
        ...   
568      aries
569        leo
570      aries
571      aries
572       None
Name: Predicted, Length: 573, dtype: object
0            libra
1        capricorn
2           gemini
3           can

# 6. Results and Conclusions

With the accuracy score of the predictions being almost identical with a score that would be the case for a random assignment of astrological signs, there is no other way than to conclude that this way of predicting astrological signs does not work at all.

This is not any news for the person that is highly sceptical towards non-scientific fields of study such as astrology. A person who strongly believes in the correlations between planetary positions and human psyches might critique this research by arguing that the character analysis of celebrities is too superficial. They might assert that celebrities often operate based on a crafted image or set of behaviors designed for success, which may not reflect their deeper personal struggles. They could say that astrology doesn't necessarily depict the main characteristics of the person perceivable by the person themselves or the other people perceiving this person, but rather a theme of where the specific person goes through in this life. Such people would be very sceptical towards ever being able to assign an astrology sign with a large language model, unless maybe the large language model is trained on highly open and raw transcripts of the sessions that a celebrity could possibly have with a therapist, but even then it is very questionable.