<a href="https://colab.research.google.com/github/pavithra64/LLM/blob/main/linkedin_llm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Designing and developing a large language model (LLM) to generate human-like, professional LinkedIn posts across various themes—while ensuring complete independence from external APIs.**

**Objective:** Build an LLM that can generate contextually rich, professional, and engaging LinkedIn posts.

**Independence:** No reliance on external APIs (e.g., OpenAI, Anthropic).

**Deployment Target:** On-premises or private cloud setup for control and privacy.

**Capabilities:** The model should understand:

Tone, style, and audience of LinkedIn posts.

**Contextual themes:** career advice, product launches, achievements, events, hiring, etc.



**Steps**


Environment Setup

Select a Base LLM

Collect and Prepare Dataset

Preprocess Data

Fine-tune LLM with LoRA

Generate LinkedIn Posts Locally

Deploy the Model Locally

Build a Simple UI



**Environment Setup**

**Requirements**

GPU (NVIDIA with ≥16GB VRAM recommended)

Python ≥ 3.10

Linux or WSL (Windows Subsystem for Linux)

In [2]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
!pip install transformers datasets peft accelerate bitsandbytes

Looking in indexes: https://download.pytorch.org/whl/cu121


In [3]:
!pip install bitsandbytes



In [4]:
#Select Base LLM

from transformers import AutoTokenizer, AutoModelForCausalLM

# model_name = "mistralai/Mistral-7B-v0.1" # This model is gated and requires authentication
model_name = "gpt2" # Using an open-source model that does not require authentication
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


# Collect and Prepare Dataset


In [50]:
from datasets import Dataset, DatasetDict
data = [
        {"theme": "career milestone", "tone": "humble", "persona": "Software Engineer", "input": "Started new job at Google", "output": "Thrilled to announce I've joined Google as a Software Engineer! Looking forward to this new chapter."},
        {"theme": "product launch", "tone": "exciting", "persona": "Product Manager", "input": "Launching new feature", "output": "Excited to launch our new feature that will revolutionize the way we work!"},
        {"theme": "achievement", "tone": "proud", "persona": "Data Scientist", "input": "Completed a challenging project", "output": "Successfully completed a challenging data science project, leveraging machine learning to gain valuable insights."},
        {"theme": "hiring", "tone": "enthusiastic", "persona": "Hiring Manager", "input": "We are hiring!", "output": "Our team is growing! We're looking for talented individuals to join us."},
        {"theme": "event", "tone": "invitational", "persona": "Marketing Specialist", "input": "Join our webinar", "output": "Don't miss our upcoming webinar on the latest trends in marketing. Register now!"},
        {"theme": "career advice", "tone": "helpful", "persona": "Career Coach", "input": "Tips for interviewing", "output": "Here are my top tips for acing your next job interview."},
        {"theme": "industry news", "tone": "informative", "persona": "Industry Analyst", "input": "New report on tech trends", "output": "Read our latest report on the emerging trends in the tech industry."},
        {"theme": "company culture", "tone": "authentic", "persona": "HR Manager", "input": "Our values in action", "output": "Proud to see our company values reflected in the daily work of our team."},
        {"theme": "project update", "tone": "progress-focused", "persona": "Project Lead", "input": "Project X progress", "output": "Making great progress on Project X! Stay tuned for more updates."},
        {"theme": "networking", "tone": "open", "persona": "Business Developer", "input": "Connecting with professionals", "output": "Enjoyed connecting with so many inspiring professionals at [Event Name]."},
        {"theme": "skill sharing", "tone": "educational", "persona": "Educator", "input": "Learn Python basics", "output": "Sharing a quick tutorial on Python basics for beginners."},
        {"theme": "thought leadership", "tone": "insightful", "persona": "CEO", "input": "Future of work", "output": "My thoughts on the future of work and how we can adapt."},
        {"theme": "customer success", "tone": "grateful", "persona": "Account Manager", "input": "Client success story", "output": "Celebrating a fantastic success story with our client [Client Name]!"}
]

print("Updated dataset:")
display(data)


# Convert the list of dictionaries to a Dataset
dataset = Dataset.from_list(data)


print("Dataset loaded successfully:")
display(dataset)

Updated dataset:


[{'theme': 'career milestone',
  'tone': 'humble',
  'persona': 'Software Engineer',
  'input': 'Started new job at Google',
  'output': "Thrilled to announce I've joined Google as a Software Engineer! Looking forward to this new chapter."},
 {'theme': 'product launch',
  'tone': 'exciting',
  'persona': 'Product Manager',
  'input': 'Launching new feature',
  'output': 'Excited to launch our new feature that will revolutionize the way we work!'},
 {'theme': 'achievement',
  'tone': 'proud',
  'persona': 'Data Scientist',
  'input': 'Completed a challenging project',
  'output': 'Successfully completed a challenging data science project, leveraging machine learning to gain valuable insights.'},
 {'theme': 'hiring',
  'tone': 'enthusiastic',
  'persona': 'Hiring Manager',
  'input': 'We are hiring!',
  'output': "Our team is growing! We're looking for talented individuals to join us."},
 {'theme': 'event',
  'tone': 'invitational',
  'persona': 'Marketing Specialist',
  'input': 'Join o

Dataset loaded successfully:


Dataset({
    features: ['theme', 'tone', 'persona', 'input', 'output'],
    num_rows: 13
})

In [51]:
# split the dataset into training and validation sets
dataset = dataset.train_test_split(test_size=0.1)
train_dataset = dataset['train']
validation_dataset = dataset['test']

## Preprocess Data

Tokenize the dataset to prepare it for training.

In [52]:
def tokenize_function(examples):
    # Combine the input features into a single text string for the model
    text = [f"Theme: {t}\nTone: {to}\nPersona: {p}\nInput: {i}\nOutput: {o}" for t, to, p, i, o in zip(examples["theme"], examples["tone"], examples["persona"], examples["input"], examples["output"])]
    return tokenizer(text, padding="max_length", truncation=True)

# Set the padding token for the tokenizer
tokenizer.pad_token=tokenizer.eos_token

# Apply the tokenization function to the dataset
tokenized_datasets = dataset.map(tokenize_function, batched=True)

# You can inspect the tokenized dataset
print("Tokenized dataset:")
display(tokenized_datasets)

Map:   0%|          | 0/11 [00:00<?, ? examples/s]

Map:   0%|          | 0/2 [00:00<?, ? examples/s]

Tokenized dataset:


DatasetDict({
    train: Dataset({
        features: ['theme', 'tone', 'persona', 'input', 'output', 'input_ids', 'attention_mask'],
        num_rows: 11
    })
    test: Dataset({
        features: ['theme', 'tone', 'persona', 'input', 'output', 'input_ids', 'attention_mask'],
        num_rows: 2
    })
})

**Fine-tune LLM with PEFT + LoRA**

In [53]:
from peft import LoraConfig, get_peft_model, TaskType
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,

    target_modules=["c_attn"],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

peft_model = get_peft_model(model, lora_config)

# Print the number of trainable parameters
peft_model.print_trainable_parameters()

trainable params: 294,912 || all params: 124,734,720 || trainable%: 0.2364




In [None]:
#Training Setup

training_args = TrainingArguments(
    output_dir="./linkedin-llm",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    logging_steps=10,
    save_steps=100,
    save_total_limit=1,
    fp16=True,
    report_to="none"
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],  # Specify the training split
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)


trainer.train()

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Step,Training Loss


In [None]:
peft_model.save_pretrained("./linkedin-llm")
tokenizer.save_pretrained("./linkedin-llm")

In [None]:
#Generate LinkedIn Posts Locally
from transformers import pipeline
from peft import PeftModel, PeftConfig
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig

# Load the PEFT configuration
config = PeftConfig.from_pretrained("./linkedin-llm")

In [None]:
#Generate LinkedIn Posts Locally
import torch
from transformers import pipeline
from peft import PeftModel, PeftConfig
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig

# Load the PEFT configuration
config = PeftConfig.from_pretrained("./linkedin-llm")

# Define the quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    bnb_8bit_compute_dtype=torch.float16, # Add this line
    bnb_8bit_quant_type="nf4", # Add this line
    bnb_8bit_use_double_quant=True, # Add this line
)


# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, quantization_config=bnb_config, device_map="auto")

# Load the PEFT model
peft_model = PeftModel.from_pretrained(base_model, "./linkedin-llm")

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("./linkedin-llm")

# Create the text generation pipeline
pipe = pipeline("text-generation", model=peft_model, tokenizer=tokenizer)

In [None]:
prompt = """### INSTRUCTION:
Write a confident LinkedIn post on the theme "career switch".
Role: Data Analyst
Event: Transitioned to Data Scientist at Microsoft

### RESPONSE:
"""

In [None]:
response = pipe(
    prompt,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    top_k=50,
    do_sample=True,
    num_return_sequences=1
)[0]['generated_text']

In [None]:
#Deploy Model Locally (Inference Server)

!pip install text-generation

text-generation-launcher --model-id ./linkedin-llm --quantize gptq

In [None]:
#Build a Simple Web UI
!pip install streamlit

In [None]:
import streamlit as st
from transformers import pipeline

st.title("LinkedIn Post Generator (Local LLM)")
theme = st.selectbox("Theme", ["New Job", "Promotion", "Hiring", "Career Advice"])
tone = st.selectbox("Tone", ["Confident", "Humble", "Inspirational", "Professional"])
persona = st.text_input("Your Role", "Software Engineer")
event = st.text_area("Event Description", "Started a new role at Google")

if st.button("Generate Post"):
    prompt = f"""### INSTRUCTION:
Write a {tone.lower()} LinkedIn post on the theme "{theme.lower()}".
Role: {persona}
Event: {event}
### RESPONSE:
"""
    response = pipe(prompt, max_new_tokens=200, do_sample=True)[0]['generated_text']
    st.write(response)

In [None]:
streamlit run linkedin_generator_ui.py