# Classification using NLI (Natural language inference)

- 전체 텍스트를 요약하고 요약본의 클래스를 분류한다.
- 원본 글로 분류하면 잘 안됨
- 요약을 하고 난 요약본으로 분류 하면 잘 됨

# 0. Setup

In [1]:
!pip -q install -U transformers langchain

In [22]:
import json
import boto3
import textwrap

from langchain.llms.bedrock import Bedrock
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from transformers import pipeline

In [23]:
profile_name = None
region = 'us-east-1'

In [24]:
session = boto3.Session(
    profile_name=profile_name,
    region_name=region,
)
bedrock = session.client(service_name='bedrock-runtime')

In [25]:
modelId = 'anthropic.claude-v2'

In [26]:
llm = Bedrock(
    model_id=modelId,
    model_kwargs={
        "max_tokens_to_sample": 4096,
        "top_p": 0.9,
        "temperature": 0,
    },
    client=bedrock,
)

# 1. Data Load

- [Anti-corruption layer pattern](https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/acl.html)

In [31]:
with open('acl.txt', 'r') as fp:
    doc = fp.read()
textwrap.shorten(doc, width=80, placeholder=' ...')

'Anti-corruption layer pattern PDF RSS Intent The anti-corruption layer (ACL) ...'

# 2. Summarize

In [35]:
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n"],
    chunk_size=3000,
    chunk_overlap=100,
)
docs = text_splitter.create_documents([doc])

In [36]:
summary_chain = load_summarize_chain(
    llm=llm,
    chain_type="map_reduce",
    verbose=False,
)

In [37]:
output = summary_chain(docs)

In [38]:
summary = output['output_text'].strip()
print(summary)

Here is a concise summary of the key points:

- The anti-corruption layer (ACL) pattern provides an adapter between systems with different semantics to enable communication without modifying either system. 

- It decouples the systems so changes to one don't require changes to the other. 

- Useful for migrating monoliths to microservices, integrating with external systems, or connecting bounded contexts.

- Reduces coordination needs and disruption from redirecting calls during incremental migration.

- Adds operational overhead - needs monitoring, alerting, CI/CD. 

- Implement as shared converter or service-specific class. Consider latency, scaling, failure tolerance.

- Can be implemented inside monolith or as separate service. Decommission after migration complete.

- Allows calling migrated services from monolith without changing monolith during incremental migration.

- Minimizes risk and disruption during migration process.


# 3. Classify using NLI

- nli 모델로 텍스트 카테고리 분류

In [15]:
classifier = pipeline(
    task='zero-shot-classification',
    model='facebook/bart-large-mnli',
)

In [16]:
candidate_labels = ['software engineer', 'web designer', 'digital marketer']

In [39]:
len(doc)

9981

In [40]:
%%time

res = classifier(doc, candidate_labels)
print(f'{json.dumps(res, indent=2)}\n')

{
  "sequence": "Anti-corruption layer pattern\nPDF\nRSS\nIntent\n\nThe anti-corruption layer (ACL) pattern acts as a mediation layer that translates domain model semantics from one system to another system. It translates the model of the upstream bounded context (monolith) into a model that suits the downstream bounded context (microservice) before consuming the communication contract that's established by the upstream team. This pattern might be applicable when the downstream bounded context contains a core subdomain, or the upstream model is an unmodifiable legacy system. It also reduces transformation risk and business disruption by preventing changes to callers when their calls have to be redirected transparently to the target system.\nMotivation\n\nDuring the migration process, when a monolithic application is migrated into microservices, there might be changes in the domain model semantics of the newly migrated service. When the features within the monolith are required to call 

In [41]:
len(summary)

945

In [42]:
%%time

res = classifier(summary, candidate_labels)
print(f'{json.dumps(res, indent=2)}\n')

{
  "sequence": "Here is a concise summary of the key points:\n\n- The anti-corruption layer (ACL) pattern provides an adapter between systems with different semantics to enable communication without modifying either system. \n\n- It decouples the systems so changes to one don't require changes to the other. \n\n- Useful for migrating monoliths to microservices, integrating with external systems, or connecting bounded contexts.\n\n- Reduces coordination needs and disruption from redirecting calls during incremental migration.\n\n- Adds operational overhead - needs monitoring, alerting, CI/CD. \n\n- Implement as shared converter or service-specific class. Consider latency, scaling, failure tolerance.\n\n- Can be implemented inside monolith or as separate service. Decommission after migration complete.\n\n- Allows calling migrated services from monolith without changing monolith during incremental migration.\n\n- Minimizes risk and disruption during migration process.",
  "labels": [
   