<a href="https://colab.research.google.com/github/satishkhanna/Challenges_in_Machine_Learning_project/blob/master/quickstarts/Prompting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2024 Google LLC.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini API: Prompting Quickstart

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Prompting.ipynb"><img src="https://github.com/google-gemini/cookbook/blob/main/images/colab_logo_32px.png?raw=1" />Run in Google Colab</a>
  </td>
</table>

This notebook contains examples of how to write and run your first prompts with the Gemini API.

## Learn more

There's lots more to learn!

* For more fun prompts, check out [Market a Jetpack](https://github.com/google-gemini/cookbook/blob/main/examples/Market_a_Jet_Backpack.ipynb).
* Check out the [safety quickstart](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Safety.ipynb) next to learn about the Gemini API's configurable safety settings, and what to do if your prompt is blocked.
* For lots more details on using the Python SDK, check out this [detailed quickstart](https://ai.google.dev/tutorials/python_quickstart).

In [2]:
import pandas as pd
import spacy
import numpy as np
import nltk
import openpyxl

In [50]:
df = pd.read_csv("/content/complaints.csv")

In [51]:
nlp = spacy.load ('en_core_web_sm')
doc = nlp(df.Sub_issue.iloc[0])

In [56]:
tokens = []
lemma = []
pos = []

for doc in nlp.pipe (df['Sub_issue'].astype('unicode').values, batch_size = 50):
  if doc.is_parsed:
    tokens.append([n.text for n in doc])
    lemma.append([n.lemma_ for n in doc])
    pos.append([n.pos_ for n in doc])
  else:
    tokens.append(None)
    lemma.append(None)
    pos.append(None)

df['issue_tokens'] = tokens
df['issue_lemma'] = lemma
df['issue_pos'] = pos

  if doc.is_parsed:


In [58]:
def to_doc(words:tuple) -> spacy.tokens.Doc:
  return nlp(' '.join(words))

def remove_stops(doc) -> list:
  return [token.text for token in doc if not token.is_stop]

In [60]:
docs = list(map(to_doc, df.issue_lemma))
df['removed_stops'] = list(map(remove_stops, docs))

In [64]:
import re
#Remove punctuation
df['removed_stops_proces'] = df['removed_stops'].map(lambda x: re.sub("[,\.!?]","",str(x)))
#convert to lower
df['removed_stops_proces'] = df['removed_stops_proces'].map(lambda x:x.lower())

In [66]:
df['removed_stops_proces'] = df['removed_stops_proces'].str.replace("'",'')

In [68]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

In [69]:
cv = CountVectorizer(max_df = 0.9, min_df = 2)
dtm = cv.fit_transform(df['removed_stops_proces'])

In [70]:
LDA = LatentDirichletAllocation(n_components = 6, random_state = 42)
LDA.fit(dtm)

In [73]:
LDA.components_[2]

array([1.66666949e-01, 1.66718690e-01, 1.66667273e-01, 1.66670633e-01,
       1.66667902e-01, 1.66667913e-01, 1.66666952e-01, 1.66666887e-01,
       1.66757910e-01, 1.41666533e+01, 1.66708421e-01, 1.66667378e-01,
       1.66666768e-01, 1.66881724e-01, 1.66666801e-01, 1.66666944e-01,
       1.66714690e-01, 1.66666928e-01, 1.66666829e-01, 1.66666831e-01,
       1.66666826e-01, 1.66667316e-01, 1.04705814e+01, 1.66667561e-01,
       1.66666910e-01, 1.66667089e-01, 2.41624258e+01, 4.16664844e+00,
       1.66667135e-01, 1.66908366e-01, 1.66667338e-01, 1.66666956e-01,
       1.66667769e-01, 1.66667338e-01, 1.66929954e-01, 1.66668434e-01,
       1.66910484e-01, 1.66666796e-01, 1.66817093e-01, 3.44616596e+03,
       1.66777266e-01, 1.66666980e-01, 1.66666789e-01, 1.89815693e+03,
       1.66668235e-01, 1.66905507e-01, 1.66666950e-01, 1.66881724e-01,
       1.66685720e-01, 1.67370158e-01, 1.66723281e-01, 1.66667102e-01,
       4.15270413e+00, 1.66667166e-01, 1.66667117e-01, 1.66666943e-01,
      

In [75]:
for i,topic in enumerate (LDA.components_):
  print(f'The top 10 words for topic #{i}')
  print([cv.get_feature_names_out()[index] for index in topic.argsort()[-10:]])
  print ('\n')

The top 10 words for topic #0
['repeat', 'process', 'card', 'open', 'knowledge', 'consent', 'investigation', 'report', 'error', 'fix']


The top 10 words for topic #1
['theft', 'disclose', 'notification', 'reappear', 'away', 'old', 'wrong', 'attempt', 'collect', 'debt']


The top 10 words for topic #2
['score', 'problem', 'nan', 'inquiry', 'recognize', 'credit', 'improperly', 'use', 'company', 'report']


The top 10 words for topic #3
['phone', 'difficulty', 'submit', 'card', 'dispute', 'personal', 'status', 'information', 'account', 'incorrect']


The top 10 words for topic #4
['record', 'inaccurate', 'dispute', 'problem', 'result', 'status', 'notify', '30', 'day', 'investigation']


The top 10 words for topic #5
['disburse', 'instruct', 'handle', 'insurance', 'fund', 'communicate', 'issue', 'miss', 'belong', 'information']




In [76]:
topic_results = LDA.transform(dtm)
df['Topic'] = topic_results.argmax(axis = 1)

In [78]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


True

In [79]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

In [80]:
df['removed_stops_proces'] = df['removed_stops_proces'].str.replace("[",'')
df['removed_stops_proces'] = df['removed_stops_proces'].str.replace("[",'')

In [82]:
df['scores'] = df['removed_stops_proces'].apply(lambda removed_stops_proces: sid.polarity_scores(removed_stops_proces))
df['compound'] = df['scores'].apply(lambda d:d['compound'])
df['comp_score'] = df['compound'].apply (lambda score: 'positive' if score > 0 else ('negative' if score < 0 else 'neutral'))
df['neg_score'] = df['scores'].apply (lambda x:x.get('neg'))
df['sentiment'] = np.where (df['neg_score']>0,'negative',np.where (df['compound']<0,'negative', np.where (df['compound']>0,'positive','neutral')))

In [83]:
df

Unnamed: 0,Date received,Product,Sub-product,Issue,Sub_issue,Consumer_complaint_narrative,Company public response,Company,State,ZIP code,...,issue_lemma,issue_pos,removed_stops,removed_stops_proces,Topic,scores,compound,comp_score,neg_score,sentiment
0,10/26/2024,Credit reporting or other personal consumer re...,Credit reporting,Incorrect information on your report,Information belongs to someone else,,,"EQUIFAX, INC.",PA,19153,...,"[information, belong, to, someone, else]","[NOUN, VERB, ADP, PRON, ADV]","[information, belong]",information belong],5,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,neutral,0.0,neutral
1,10/26/2024,Credit reporting or other personal consumer re...,Credit reporting,Improper use of your report,Credit inquiries on your report that you don't...,,,"EQUIFAX, INC.",SC,29212,...,"[credit, inquiry, on, your, report, that, you,...","[NOUN, NOUN, ADP, PRON, NOUN, SCONJ, PRON, AUX...","[credit, inquiry, report, recognize]",credit inquiry report recognize],2,"{'neg': 0.0, 'neu': 0.536, 'pos': 0.464, 'comp...",0.3818,positive,0.0,positive
2,10/18/2024,Credit reporting or other personal consumer re...,Credit reporting,Problem with a company's investigation into an...,Was not notified of investigation status or re...,,,"EQUIFAX, INC.",SC,29418,...,"[be, not, notify, of, investigation, status, o...","[AUX, PART, VERB, ADP, NOUN, NOUN, CCONJ, NOUN]","[notify, investigation, status, result]",notify investigation status result],4,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,neutral,0.0,neutral
3,10/26/2024,Credit reporting or other personal consumer re...,Credit reporting,Incorrect information on your report,Account information incorrect,,,"EQUIFAX, INC.",SC,29483,...,"[account, information, incorrect]","[NOUN, NOUN, ADJ]","[account, information, incorrect]",account information incorrect],3,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,neutral,0.0,neutral
4,10/26/2024,Credit reporting or other personal consumer re...,Credit reporting,Incorrect information on your report,Information belongs to someone else,,,"EQUIFAX, INC.",LA,70122,...,"[information, belong, to, someone, else]","[NOUN, VERB, ADP, PRON, ADV]","[information, belong]",information belong],5,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,neutral,0.0,neutral
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19779,10/23/2024,Credit reporting or other personal consumer re...,Credit reporting,Incorrect information on your report,Account information incorrect,,,"TRANSUNION INTERMEDIATE HOLDINGS, INC.",FL,32811,...,"[account, information, incorrect]","[NOUN, NOUN, ADJ]","[account, information, incorrect]",account information incorrect],3,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,neutral,0.0,neutral
19780,10/23/2024,Credit reporting or other personal consumer re...,Credit reporting,Incorrect information on your report,Account information incorrect,,,"TRANSUNION INTERMEDIATE HOLDINGS, INC.",FL,32811,...,"[account, information, incorrect]","[NOUN, NOUN, ADJ]","[account, information, incorrect]",account information incorrect],3,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,neutral,0.0,neutral
19781,10/18/2024,Credit reporting or other personal consumer re...,Credit reporting,Incorrect information on your report,Information belongs to someone else,,,Experian Information Solutions Inc.,CA,92602,...,"[information, belong, to, someone, else]","[NOUN, VERB, ADP, PRON, ADV]","[information, belong]",information belong],5,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,neutral,0.0,neutral
19782,10/21/2024,Credit reporting or other personal consumer re...,Credit reporting,Incorrect information on your report,Account status incorrect,,,"EQUIFAX, INC.",FL,34771,...,"[account, status, incorrect]","[NOUN, NOUN, NOUN]","[account, status, incorrect]",account status incorrect],3,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0000,neutral,0.0,neutral
