# Defining an evaluation strategies for an AI application

Often times, we want to evaluate an AI application based on multiple evals. We can create as many evals as we want, and then define an evaluation strategy that aggregates them.

In this tutorial, we will learn how to create an multi-dimensional error detection strategy utilizing different evaluators.

# Application

We want to evaluate an AI application that helps journalists generate articles.

The user provides instructions for the article, that can include context, and the application generates an article.

# Sample data

For illustration purposes, we will use only 2 articles synthetically generated by an LLM.

In [1]:

context = """GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Tyna Eloundou, Sam Manning, Pamela Mishkin, Daniel Rock
We investigate the potential implications of large language models (LLMs), such as Generative Pre-trained Transformers (GPTs), on the U.S. labor market, focusing on the increased capabilities arising from LLM-powered software compared to LLMs on their own. Using a new rubric, we assess occupations based on their alignment with LLM capabilities, integrating both human expertise and GPT-4 classifications. Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted. We do not make predictions about the development or adoption timeline of such LLMs. The projected effects span all wage levels, with higher-income jobs potentially facing greater exposure to LLM capabilities and LLM-powered software. Significantly, these impacts are not restricted to industries with higher recent productivity growth. Our analysis suggests that, with access to an LLM, about 15% of all worker tasks in the US could be completed significantly faster at the same level of quality. When incorporating software and tooling built on top of LLMs, this share increases to between 47 and 56% of all tasks. This finding implies that LLM-powered software will have a substantial effect on scaling the economic impacts of the underlying models. We conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications.

Our study is motivated less by the progress of these models alone though, and more by the breadth,
scale, and capabilities we’ve seen in the complementary technologies developed around them. The role of
complementary technologies remains to be seen, but maximizing the impact of LLMs appears contingent
on integrating them with larger systems (Bresnahan, 2019; Agrawal et al., 2021). While the focus of our
discussion is primarily on the generative capabilities of LLMs, it is important to note that these models can
also be utilized for various tasks beyond text generation. For example, embeddings from LLMs can be used
for custom search applications, and LLMs can perform tasks such as summarization and classification where
the context may be largely contained in the prompt.
To complement predictions of technology’s impacts on work and provide a framework for understanding
the evolving landscape of language models and their associated technologies, we propose a new rubric
for assessing LLM capabilities and their potential effects on jobs. This rubric (A.1) measures the overall
exposure of tasks to LLMs, following the spirit of prior work on quantifying exposure to machine learning
(Brynjolfsson et al., 2018; Felten et al., 2018; Webb, 2020). We define exposure as a proxy for potential
economic impact without distinguishing between labor-augmenting or labor-displacing effects. We employ
human annotators and GPT-4 itself as a classifier to apply this rubric to occupational data in the U.S. economy,
primarily sourced from the O*NET database.1 2
To construct our primary exposure dataset, we collected both human annotations and GPT-4 classifications,
using a prompt tuned for agreement with a sample of labels from the authors. We observe similar agreement

In conclusion, this study offers an examination of the potential impact of LLMs on various occupations and
industries within the U.S. economy. By applying a new rubric for understanding LLM capabilities and their
potential effects on jobs, we have observed that most occupations exhibit some degree of exposure to LLMs,
with higher-wage occupations generally presenting more tasks with high exposure. Our analysis indicates that
approximately 19% of jobs have at least 50% of their tasks exposed to LLMs when considering both current
model capabilities and anticipated LLM-powered software.
Our research aims to highlight the general-purpose potential of LLMs and their possible implications for
US workers. Previous literature demonstrates the impressive improvements of LLMs to date (see 2.1). Our
findings confirm the hypothesis that these technologies can have pervasive impacts across a wide swath of
occupations in the US, and that additional advancements supported by LLMs, mainly through software and
digital tools, can have significant effects on a range of economic activities. However, while the technical
capacity for LLMs to make human labor more efficient appears evident, it is important to recognize that social,
economic, regulatory, and other factors will influence actual labor productivity outcomes. As capabilities
continue to evolve, the impact of LLMs on the economy will likely persist and increase, posing challenges for
policymakers in predicting and regulating their trajectory.
Further research is necessary to explore the broader implications of LLM advancements, including
their potential to augment or displace human labor, their impact on job quality, impacts on inequality, skill
development, and numerous other outcomes. By seeking to understand the capabilities and potential effect"""

user_instructions = """AI is here to stay and we need to understand how AI is gonna impact the job market.

Use the information in the context to write a comprehensive article.

Please provide clear paragraph separation."""

article_1 = """# AI's Impact on the Job Market: A Comprehensive Look at Large Language Models

Artificial Intelligence (AI) is no longer a futuristic concept; it's here, and its impact on the job market is becoming increasingly apparent. A recent study by researchers Tyna Eloundou, Sam Manning, Pamela Mishkin, and Daniel Rock sheds light on how large language models (LLMs), such as GPT-4, could potentially reshape the U.S. labor landscape.

## The Widespread Impact of AI on Jobs

The study reveals a startling statistic: approximately 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs. Even more striking, about 19% of workers may see at least 50% of their tasks impacted. This suggests that AI's influence on the job market is not limited to a few sectors but could have far-reaching consequences across various industries.

## Higher-Income Jobs at Greater Risk

Contrary to popular belief, it's not just low-wage jobs that are vulnerable to AI disruption. The research indicates that higher-income jobs potentially face greater exposure to LLM capabilities and LLM-powered software. This finding challenges the notion that AI will primarily affect routine, low-skill tasks and suggests a more complex transformation of the labor market.

## The Power of LLM-Powered Software

While LLMs alone are impressive, the study emphasizes the potential of software and tools built on top of these models. With access to an LLM, about 15% of all worker tasks in the U.S. could be completed significantly faster at the same level of quality. However, when incorporating LLM-powered software, this share increases dramatically to between 47 and 56% of all tasks. This highlights the importance of considering not just the AI models themselves, but also the ecosystem of tools and applications that leverage their capabilities.

## AI as a General-Purpose Technology

The researchers conclude that LLMs such as GPTs exhibit traits of general-purpose technologies, indicating that they could have considerable economic, social, and policy implications. This classification puts AI in the same category as other transformative technologies like electricity or the internet, suggesting its potential to reshape multiple sectors of the economy.

## The Need for Further Research and Policy Considerations

While the study provides valuable insights into AI's potential impact on the job market, it also highlights the need for further research. The authors stress the importance of exploring the broader implications of LLM advancements, including their potential to augment or displace human labor, impact job quality, influence inequality, and affect skill development.

As AI continues to evolve and integrate into various aspects of work, policymakers face the challenge of predicting and regulating its trajectory. The pervasive nature of AI's potential impact underscores the need for proactive policies that can help workers and industries adapt to this technological shift.

In conclusion, AI is indeed here to stay, and its impact on the job market is likely to be profound and wide-ranging. As we navigate this new landscape, it's crucial for workers, employers, and policymakers to stay informed and adaptable, ready to harness the potential of AI while mitigating its potential disruptions to the workforce."""
article_2 ="""# Artificial Intelligence Might Change Everything About Jobs, New Study Claims

In a world where technology seems to be taking over everything, a new study has come out that's got everyone talking. It's all about how artificial intelligence, or AI for short, could shake up the job market in a big way. The study, done by some researchers with really long names, looks at something called "large language models" or LLMs. These are like super smart computer programs that can understand and write text, kind of like a robot writer.

So, these researchers say that these LLMs could affect a whopping 80% of workers in the United States. That's a lot of people! They think that for most of these workers, at least 10% of what they do at work could be changed by AI. And for some people, it could be even more - like half of their job tasks! Can you imagine that? It's like having a robot colleague that can do half your work.

But here's the really weird part - it's not just the simple jobs that might be affected. The study says that people with high-paying jobs might actually be more at risk. That's pretty surprising, right? You'd think it would be the other way around. I guess even the bosses aren't safe from the robot invasion!

The researchers also talked about something called "LLM-powered software." Basically, that's when you take these smart AI programs and use them to make other computer programs. They say this could make an even bigger difference. With this kind of software, almost half of all the tasks people do at work could be done faster or better. That's a lot of change!

Now, the study doesn't say exactly when all this is going to happen. It's not like we're going to wake up tomorrow and find robots sitting at our desks. But it does seem like it could be a big deal in the future. The researchers think AI could be as important as electricity was when it was invented. Remember learning about that in history class? It changed everything!

Of course, not everyone agrees about how big a deal this is going to be. Some people think the researchers are exaggerating, and that AI won't really change that much. Others are worried that it could lead to a lot of people losing their jobs. It's hard to know who's right.

The study also talks about how AI might affect different industries. But to be honest, it gets pretty complicated and boring at that point. There's a lot of technical stuff about "productivity growth" and "economic implications" that I didn't really understand. I guess that's why these researchers get paid the big bucks!

In the end, the main takeaway seems to be that AI is coming, and it's going to change things. Whether that's good or bad probably depends on your job and how well you can adapt to working with robot helpers. Maybe it's time to start being extra nice to your computer, just in case!

The researchers say we need to do more studies to really understand what's going to happen. They want to look at things like how AI might make some jobs better or worse, and how it might affect inequality. That sounds like a lot more work for them!

So, there you have it. AI is coming for our jobs, maybe. Or maybe not. It's all very exciting and confusing at the same time. I guess we'll just have to wait and see what happens. In the meantime, maybe we should all start learning how to program these AI things. You know, just in case."""

# Evaluation criteria and rubrics

We want to evaluate the articles based on the following criteria:

| Evaluation | Explanation |
|------------|-------------|
| Completeness | Does the article provide comprehensive coverage of the topics and information the instructions provided, addressing all relevant aspects and key points? |
| Clarity | Is the article written in a clear, concise, and easily understandable manner? |
| Source Attribution | Does the article properly attribute information to reliable sources provided in the information and the instructions? |
| Objectivity | Does the article present information in an unbiased manner, considering multiple perspectives? |

These metric are a measure of the quality of the generated article.

Since we want to create a multi-dimensional error detection system, we are going to create rubrics in a Pass / Fail scoring scale.

Let's create the custom rubrics.

In [2]:
from flow_eval.lm import LMEval, RubricItem

# for all the metrics
required_inputs = ["user_instructions", "context"]
required_output = "article"

completeness_criteria = "Evaluate the extent to which the article provides comprehensive coverage of all topics, key points, and information specified in the instructions, ensuring that no relevant aspects are omitted or inadequately addressed."
completeness_rubric = [
    RubricItem(
        score=0,
        description="The article fails to provide comprehensive coverage of the required topics, key points, and information specified in the instructions. It omits crucial information and has significant gaps in addressing relevant aspects."
    ),
    RubricItem(
        score=1,
        description="The article offers comprehensive coverage of all required topics, key points, and information specified in the instructions. It thoroughly addresses all relevant aspects, providing in-depth information and leaving no significant gaps in coverage."
    )
]

completeness_eval = LMEval(
    name="completeness",
    criteria=completeness_criteria,
    rubric=completeness_rubric,
    input_columns=required_inputs,
    output_column=required_output
)

clarity_criteria = "Does the article's writing quality in terms of clarity, conciseness, and ease of understanding communicate effectively the information to the reader?"
clarity_rubric = [
    RubricItem(
        score=0,
        description="The article's writing quality is poor to moderate in terms of clarity, conciseness, and ease of understanding. It may have confusing sentence structures, inappropriate vocabulary, lack of organization, or instances of unnecessary verbosity. The writing does not effectively communicate the information to the reader, making it difficult to comprehend the content without significant effort."
    ),
    RubricItem(
        score=1,
        description="The article's writing quality is high in terms of clarity, conciseness, and ease of understanding. It features well-constructed sentences, appropriate vocabulary, logical organization, and efficient conveyance of information. The writing effectively communicates the information to the reader, allowing for easy comprehension and a smooth reading experience."
    )
]
clarity_eval = LMEval(
    name="clarity",
    criteria=clarity_criteria,
    rubric=clarity_rubric,
    input_columns=required_inputs,
    output_column=required_output
)

source_attribution_criteria = "Does the article accurately and comprehensively attribute information to reliable sources, ensuring that these sources align with those provided in the information and instructions?"
source_attribution_rubric = [
    RubricItem(
        score=0,
        description="The article fails to accurately and comprehensively attribute information to reliable sources. There are significant gaps or inaccuracies in attribution, and many sources either do not align with those provided in the instructions or are unreliable. Attribution practices are inconsistent or inadequate, with key information often lacking proper sourcing."
    ),
    RubricItem(
        score=1,
        description="The article accurately and comprehensively attributes information to reliable sources that align with those provided in the information and instructions. Attribution practices are consistently followed throughout the article, with all key information properly sourced and credited. The sourcing is appropriate and demonstrates excellent adherence to attribution standards."
    )
]
source_attribution_eval = LMEval(
    name="source_attribution",
    criteria=source_attribution_criteria,
    rubric=source_attribution_rubric,
    input_columns=required_inputs,
    output_column=required_output
)

objectivity_criteria = "Evaluate whether the article presents information in an unbiased manner by incorporating multiple perspectives fairly and avoiding partisan or one-sided reporting."
objectivity_rubric = [
    RubricItem(
        score=0,
        description="The article shows significant bias in its reporting. It either presents only one perspective or heavily favors a particular viewpoint. Alternative views are absent, minimized, or unfairly represented. The language used may be loaded or emotionally charged, and sources may be limited to those supporting a single perspective. The overall presentation lacks journalistic objectivity and balance."
    ),
    RubricItem(
        score=1,
        description="The article demonstrates a commitment to unbiased reporting. It presents multiple perspectives on the topic, giving fair representation to different viewpoints. The language used is neutral and objective, avoiding loaded terms or emotional rhetoric. The article uses a diverse range of credible sources to support various perspectives. While minor imperfections may exist, the overall presentation maintains journalistic integrity, balance, and objectivity."
    )
]
objectivity_eval = LMEval(
    name="objectivity",
    criteria=objectivity_criteria,
    rubric=objectivity_rubric,
    input_columns=required_inputs,
    output_column=required_output
)

# Custom Evaluator's

We can now easily create a model and the different judges to build our multi-dimensional error detection system.

In [5]:
from flow_eval import LMEvaluator
from flow_eval.core import EvalInput
from flow_eval.lm.models import Vllm

In [6]:

# If you are running on an Ampere GPU or newer, create a model using VLLM
model = Vllm()

# Or if not running on Ampere GPU or newer, create a model using no flash attn and Hugging Face Transformers
# model = Hf(flash_attn=False)

# Or create a model using Llamafile if not running an Nvidia GPU & running a Silicon MacOS for example
# model = Llamafile()

# Or create a model using Baseten if you don't want to run locally.
# As a pre-requisite step:
#  - Sign up to Baseten
#  - Generate an api key https://app.baseten.co/settings/api_keys
#  - Set the api key as an environment variable & initialize:
# import os
# os.environ["BASETEN_API_KEY"] = "your_api_key"
# model = Baseten()

INFO 01-23 19:34:40 awq_marlin.py:90] The model is convertible to awq_marlin during runtime. Using awq_marlin kernel.
INFO 01-23 19:34:40 llm_engine.py:226] Initializing an LLM engine (v0.6.1.dev238+ge2c6e0a82) with config: model='flowaicom/Flow-Judge-v0.1-AWQ', speculative_config=None, tokenizer='flowaicom/Flow-Judge-v0.1-AWQ', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=awq_marlin, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), 

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]


INFO 01-23 19:34:42 model_runner.py:1025] Loading model weights took 2.1717 GB
INFO 01-23 19:34:44 gpu_executor.py:122] # GPU blocks: 3076, # CPU blocks: 682


In [7]:
# Create evaluators
completeness_judge = LMEvaluator(
    eval=completeness_eval,
    model=model
)
clarity_judge = LMEvaluator(
    eval=clarity_eval,
    model=model
)
source_attribution_judge = LMEvaluator(
    eval=source_attribution_eval,
    model=model
)
objectivity_judge = LMEvaluator(
    eval=objectivity_eval,
    model=model
)

In [8]:
# Create a list of inputs and outputs
inputs_batch = [
    [
        {"user_instructions": user_instructions},
        {"context": context}
    ],
    [
        {"user_instructions": user_instructions},
        {"context": context}
    ]
]
outputs_batch = [
    {"article": article_1},
    {"article": article_2}
]

In [9]:
# Create a list of EvalInput
eval_inputs_batch = [EvalInput(inputs=inputs, output=output) for inputs, output in zip(inputs_batch, outputs_batch)]

# Run batch evaluations for all judges
completeness_results = completeness_judge.batch_evaluate(eval_inputs_batch, save_results=False)
clarity_results = clarity_judge.batch_evaluate(eval_inputs_batch, save_results=False)
source_attribution_results = source_attribution_judge.batch_evaluate(eval_inputs_batch, save_results=False)
objectivity_results = objectivity_judge.batch_evaluate(eval_inputs_batch, save_results=False)

# Combine results
all_results = {
    "completeness": completeness_results,
    "clarity": clarity_results,
    "source_attribution": source_attribution_results,
    "objectivity": objectivity_results
}

Processed prompts: 100%|██████████| 2/2 [00:10<00:00,  5.19s/it, est. speed input: 536.90 toks/s, output: 60.73 toks/s]
Processed prompts: 100%|██████████| 2/2 [00:08<00:00,  4.25s/it, est. speed input: 667.09 toks/s, output: 78.54 toks/s]
Processed prompts: 100%|██████████| 2/2 [00:05<00:00,  2.81s/it, est. speed input: 1007.79 toks/s, output: 77.30 toks/s]
Processed prompts: 100%|██████████| 2/2 [00:09<00:00,  4.51s/it, est. speed input: 634.53 toks/s, output: 69.75 toks/s]


In [10]:
from IPython.display import display, Markdown

for i in range(len(eval_inputs_batch)):
    markdown_content = f"\n## Article {i+1}\n\n"
    markdown_content += "| Metric | Score |\n"
    markdown_content += "|--------|-------|\n"
    for metric, results in all_results.items():
        score = results[i].score
        markdown_content += f"| {metric} | {score} |\n"

    markdown_content += "\n### Feedback\n\n"
    for metric, results in all_results.items():
        feedback = results[i].feedback
        markdown_content += f"**{metric.capitalize()}**: {feedback}\n\n"

    display(Markdown(markdown_content))


## Article 1

| Metric | Score |
|--------|-------|
| completeness | 1 |
| clarity | 1 |
| source_attribution | 1 |
| objectivity | 1 |

### Feedback

**Completeness**: The article provides comprehensive coverage of the key points and information specified in the instructions. It addresses the impact of large language models (LLMs) on the job market, including the potential effects on various occupations and industries within the U.S. economy. The article covers the widespread impact of AI on jobs, the specific risks to higher-income jobs, the power of LLM-powered software, and the classification of LLMs as general-purpose technologies. Additionally, it highlights the need for further research and policy considerations, addressing the broader implications of LLM advancements. The article thoroughly addresses all relevant aspects, providing in-depth information and leaving no significant gaps in coverage.

**Clarity**: The article's writing quality is high in terms of clarity, conciseness, and ease of understanding. The content is well-structured, with clear paragraph separation and logical flow. The vocabulary used is appropriate and varied, enhancing readability.

The article effectively communicates complex information about AI's impact on the job market. It begins with an engaging introduction that sets the context, followed by well-organized sections that cover key points such as the widespread impact of AI, its effect on higher-income jobs, the power of LLM-powered software, and the broader implications of AI as a general-purpose technology.

Each section is concise yet informative, avoiding unnecessary verbosity while providing sufficient detail to convey the research findings. The writing maintains a balance between technical accuracy and accessibility, making the content understandable for a general audience.

The conclusion effectively summarizes the main points and emphasizes the need for further research and policy considerations, providing a clear takeaway for readers.

Overall, the writing quality of the article is excellent, effectively communicating the information to the reader and facilitating easy comprehension of the complex topic of AI's impact on the job market.

**Source_attribution**: The article accurately and comprehensively attributes information to reliable sources that align with those provided in the information and instructions. The attribution practices are consistently followed throughout the article, with all key information properly sourced and credited.

The article begins by mentioning the study by researchers Tyna Eloundou, Sam Manning, Pamela Mishkin, and Daniel Rock, which is directly aligned with the context provided. The statistics and findings discussed in the article are accurately attributed to this study, demonstrating excellent adherence to attribution standards.

Furthermore, the article references the potential impact of AI on job quality, inequality, and skill development, which are mentioned in the context as areas for further research. This shows that the article is building upon the provided information and maintaining consistency in its attributions.

Overall, the article demonstrates excellent attribution practices, with all key information properly sourced and credited, aligning perfectly with the provided information and instructions.

**Objectivity**: The article presents information in an unbiased manner by incorporating multiple perspectives fairly and avoiding partisan or one-sided reporting. It discusses the potential impact of large language models (LLMs) on the job market, highlighting both positive and negative aspects.

The article acknowledges the widespread impact of AI on jobs, citing statistics from the study by researchers Tyna Eloundou, Sam Manning, Pamela Mishkin, and Daniel Rock. It presents the potential risks to both low-wage and higher-income jobs, challenging the notion that AI will primarily affect routine, low-skill tasks.

The article also discusses the power of LLM-powered software, emphasizing that while LLMs alone are impressive, the ecosystem of tools and applications that leverage their capabilities is even more impactful. This demonstrates a balanced view of the technology's potential.

Furthermore, the article mentions the need for further research and policy considerations, highlighting the importance of exploring the broader implications of LLM advancements, including their potential to augment or displace human labor, impact job quality, influence inequality, and affect skill development.

The language used in the article is neutral and objective, avoiding loaded terms or emotional rhetoric. While the article primarily focuses on the study's findings, it also acknowledges the need for further research and the importance of proactive policies, showing a balanced approach to the topic.

Overall, the article maintains journalistic integrity, balance, and objectivity by presenting multiple perspectives on the topic and using a diverse range of credible sources to support various viewpoints.




## Article 2

| Metric | Score |
|--------|-------|
| completeness | 0 |
| clarity | 1 |
| source_attribution | 0 |
| objectivity | 0 |

### Feedback

**Completeness**: The article provides a general overview of the potential impact of large language models (LLMs) on the job market, but it falls short in several key areas when evaluated against the instructions and scoring rubric.

1. **Comprehensive Coverage**: The article mentions some important points such as the potential impact of LLMs on 80% of U.S. workers, the possibility of 50% of tasks being affected, and the introduction of LLM-powered software. However, it lacks depth in several areas:
   - It doesn't fully explain the concept of LLMs and their capabilities.
   - The potential effects on different industries and job types are not thoroughly explored.
   - The implications for job quality, skill development, and inequality are not discussed in detail.
   - The potential for AI to augment rather than replace human labor is not adequately addressed.

2. **Key Points and Information**: While the article touches on several key points, it fails to provide in-depth information on several crucial aspects:
   - The study's methodology and data sources are not explained.
   - The potential for AI to improve job quality and productivity is not discussed.
   - The societal and economic implications of widespread AI adoption are not thoroughly explored.

3. **Omissions and Gaps**: The article omits several important aspects:
   - It doesn't discuss the potential for AI to create new job categories or improve existing ones.
   - The role of policy and regulation in managing the transition to an AI-augmented workforce is not addressed.
   - The potential for AI to improve work-life balance and reduce repetitive tasks is not discussed.

Overall, while the article provides a basic overview of the potential impact of LLMs on jobs, it lacks the depth and comprehensiveness required to fully address the instructions and scoring rubric.

**Clarity**: The article's writing quality is generally clear and easy to understand, but it falls short in terms of conciseness and some aspects of clarity. The content is well-structured and covers the main points from the context provided, making it accessible to a broad audience.

The writing is generally clear, with sentences that are easy to follow. However, there are instances where the language becomes slightly verbose, such as in the paragraph "But here's the really weird part - it's not just the simple jobs that might be affected. The study says that people with high-paying jobs might actually be more at risk. That's pretty surprising, right? You'd think it would be the other way around. I guess even the bosses aren't safe from the robot invasion!"

The vocabulary used is appropriate for the general audience, avoiding overly technical jargon while still accurately conveying the study's findings. The organization of the article is logical, starting with an introduction to the topic, discussing the main findings, and concluding with implications and future research needs.

However, the article could have been more concise. Some sections, like the paragraph about LLM-powered software, could have been more succinctly expressed. Additionally, while the article does a good job of explaining the study's main points, it occasionally strays into unnecessary elaboration, such as when discussing the potential impacts on different industries.

Overall, the writing effectively communicates the information from the study to the reader, but there is room for improvement in terms of conciseness and avoiding verbosity.

**Source_attribution**: The article does not accurately and comprehensively attribute information to reliable sources. The text makes several claims about the impact of AI on the job market, but fails to provide proper attribution to the original study or researchers mentioned in the context. For instance, it mentions that "the study, done by some researchers with really long names, looks at something called 'large language models' or LLMs," without providing any specific names or details from the original source. Additionally, the article makes claims about the impact of AI on high-paying jobs and the potential for AI to change the job market, but does not cite the specific studies or data from the provided context that support these claims. The lack of proper sourcing and attribution throughout the article indicates that the information presented is not reliably attributed to the original sources provided in the instructions.

**Objectivity**: The article presents information about the potential impact of AI on the job market, but it shows a significant bias in its reporting. The author only presents one perspective, heavily favoring the viewpoint that AI will have a substantial impact on jobs. Alternative views, such as those suggesting that the impact may be less significant or that AI could create new job opportunities, are absent or minimized. The language used is somewhat emotional and loaded, with phrases like "robot invasion" and "whopping 80% of workers" being used to create a sense of alarm. The article does not use a diverse range of credible sources to support various perspectives, relying instead on a single study to make broad claims. Overall, the presentation lacks journalistic objectivity and balance, making it more of a one-sided opinion piece rather than an unbiased report.

Therefore, based on the evaluation criteria and scoring rubric, the article does not meet the standard for unbiased reporting.

