# AI in Marketing

## Introduction

Marketing teams across industries are already weaving generative AI into day‑to‑day workflows—from AI copywriting assistants that draft product descriptions to predictive tools that micro‑segment audiences in real time. As McKinsey observes, “generative AI is poised to be a catalyst for a new age of marketing capabilities” (Harkness et al., 2023, para. 1). Beyond inspiration, the economic stakes are substantial: the same analysis estimates that marketing productivity could rise by 5–15 percent, worth roughly US \$463 billion annually, while generative AI overall may unlock up to US \$4.4 trillion in global value. Brands that deploy these tools early report double‑digit lifts in click‑through, conversion, and content velocity, underscoring how quickly AI‑enabled competitors can out‑pace traditional approaches.

Generative AI’s impact on marketing coalesces around four capability pillars—targeting, personalization, content generation, and ad optimization. **Targeting** systems draws on user‑modeling methods that “predict a user’s response to specific ad content” from browsing, purchase, and social footprints (Gao et al., 2019, p. 4). **Personalization** builds on those insights via recommendation engines that score each impression against a consumer’s likely preferences and emotional state, pushing ads that resonate and boost engagement. **Content generation** uses LLMs and other generative models to spin up copy, visuals, and scripts at scale while preserving brand voice and experimenting with tonal variants. Finally, **ad optimization** completes the loop, dynamically selecting placements, bids, and creative variants to maximise return on investment. Taken together, targeting and ad optimization revolve around data‑driven matching, whereas personalization and content generation rely on the creative flexibility that LLMs uniquely provide.

This study therefore concentrates on the two pillars where Large Language Models have the most direct, text‑centric impact—personalization and content generation—because they combine clear business upside (higher relevance, stronger persuasion, faster creative iteration) with the unique linguistic strengths of LLMs in style transfer, tone control, and idea expansion. Accordingly, we pose a guiding research question: **How effectively do frontier and open‑source LLMs tailor marketing messages to fine‑grained consumer profiles, and which models excel at which specific marketing tasks?** By answering this question, the paper offers marketers an evidence‑based roadmap for selecting the right model‑task pairings that translate generative‑AI hype into tangible campaign lift.

## Literature Review
### Methodological Landscape
Four recent studies illustrate how scholars are beginning to interrogate LLM‑driven personalisation and content generation through distinct research designs and metrics. Matz et al. conducted four preregistered experiments (N ≈ 4,100) in which ChatGPT‑3.5 produced ads that were either trait‑matched or generic; message effectiveness was captured with 7‑point persuasiveness scales and incentive‑compatible willingness‑to‑pay bids (Matz et al., 2024). Brand, Israeli, and Ngwe treated GPT‑3.5‑Turbo as a synthetic respondent in a conjoint survey, comparing its choice shares and WTP coefficients with parallel human data and showing that a lightweight fine‑tune further narrowed prediction error (Brand et al., 2023). Aguilar and Garcia built an adaptive Facebook ad‑creation system that pairs a genetic algorithm for copy mutation with an SVM image selector; performance is logged every seven days via click‑through rate, ad‑frequency rank, and cost‑per‑click (Aguilar & Garcia, 2018). Finally, Gao et al. used VOSviewer bibliometrics to map 241 Scopus papers across targeting, personalisation, content creation, and optimisation, providing a quantitative baseline of where empirical evidence is—and is not—accumulating (Gao et al., 2023).

### What the Evidence Says About Effectiveness
Collectively, the evidence indicates that generative AI already delivers measurable uplifts when used for personalised content creation. In Matz et al. (2024), 61 % of 33 trait‑matched ads out‑performed their generic counterparts; for example, an iPhone message tailored to extraverts raised perceived effectiveness by 0.25 SD and increased willingness‑to‑pay by the cash equivalent of $33 (Matz et al., 2024). Brand, Israeli, and Ngwe (2023) showed that GPT synthetic respondents reproduced aggregate market‑share estimates within 1–11 percentage points of human baselines, and that a small fine‑tune on legacy surveys further tightened WTP alignment for new product features (Brand et al., 2023). On the executional side, Aguilar and García’s adaptive Facebook system propelled novel creatives from an average rank of 6 to the top‑two slots after six optimisation cycles while lifting click‑through rate and holding cost‑per‑click steady (Aguilar & García, 2018). Industry cases summarised in Gao et al. (2023) reinforce these quantitative signals: Lexus’s AI‑scripted television spot and McCann’s “AI Creative Director” campaign both out‑performed human‑written baselines on like‑through‑rate, and Dynamic Creative Optimisation studies report double‑digit conversion gains when copy, imagery, and offers are assembled in real time from generative components (Gao et al., 2023). Taken together, these findings suggest that AI‑driven personalisation and content generation can boost persuasive impact, predictive accuracy, and media efficiency across multiple stages of the marketing funnel.

### Limitations and Research Gaps
Despite this encouraging evidence, current work remains bounded in important ways.  Most empirical tests rely on short text ads and self‑report scales rather than the richer creative assets (social posts, video scripts, multiframe stories) encountered in practice; even the sophisticated Facebook study optimises only headline‑image‑size triads and excludes brand voice or compliance constraints.  Synthetic‑respondent papers acknowledge that LLMs still struggle to capture segment‑level heterogeneity and extreme‑tail preferences, while bibliometric mapping highlights a scarcity of task‑level benchmarks inside the “content creation” cluster.  Moreover, cross‑model comparisons are rare—few studies ask which LLM excels at which creative task.  Responding to these gaps, the present paper concentrates exclusively on personalisation and content generation, and does so at the artefact level: copywriting for ads and e‑mails, full social‑media packages, and short‑form video scripts.  Our core research question therefore becomes: How do leading LLMs differ in their ability to generate high‑quality, brand‑consistent marketing content across specific creative tasks, and which models are best suited to which jobs?  By answering this question with systematic, task‑based evaluations, we aim to extend the literature beyond generic persuasion tests toward actionable guidance for practitioners.

## Testing with LLMs
### Methodology
