In [1]:
from transformers import BertForSequenceClassification, BertTokenizer
import torch


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
tokenizer = BertTokenizer.from_pretrained('ProsusAI/finbert')
model = BertForSequenceClassification.from_pretrained('ProsusAI/finbert')

In [3]:
txt = """
What Are The Disadvantages of Cryptocurrency?
Investing in cryptocurrency might look appealing and profitable but investors should also consider a few downsides to it. 

Cryptocurrency claims to be an anonymous form of transaction, but they are actually pseudonymous which means they leave a digital trail that the Federal Bureau of Investigation can decode. So, there’s a possibility of interference from federal or government authorities to track the financial transactions of normal people.  
On a blockchain, there is a constant risk of a 51% attack which means It is a situation when a miner or group of them gets more than 50% of the network’s mining hash rate control. While in control, an ill-natured group can reverse the transaction that is completed, pause the transaction in process, double spend coins, prevent new transactions from getting validation and much more. Nevertheless, this attack is only a risk to recently hard-forked networks and new blockchains.
The majority of blockchains work on the proof-of-work consensus mechanism. Network participants are required to use powerful ASIC computers and the right hash to make a block added to the network. Due to this, there is excessive power consumption and countries are taking majors to lower its impact on the environment. 
The lack of key policies related to transactions serves as a major drawback of cryptocurrencies. The no refund or cancellation policy can be considered the default stance for transactions wrongly made across crypto wallets and each crypto stock exchange or app has its own rules.
Are Cryptocurrencies Legal In India?
Cryptocurrencies as a payment medium are not regulated or issued by any central authority in India. There are no guidelines laid down for sorting disagreements while dealing with cryptocurrency. So, if you wish to trade in crypto, do it at your own risk. 

Nirmala Sitharaman, the Finance Minister of India, initiated a tax on digital assets that has increased the discussion on the cryptocurrency legality in the country. 

Given the stance of the Reserve Bank Of India (RBI) Governor and other key ministers from time to time, it can be safe to state cryptocurrency is not banned in India. Till 2022, cryptocurrency was unregulated in the country. This changed after the government set forth a 30% and 1% tax on profits from cryptocurrencies and tax deducted at source respectively in the Union Budget of 2022. This event marked the Indian government’s official regulation of cryptocurrency in the country. 

While many supported the decision as it marks the very start of the road to getting cryptocurrency recognition, the Government of India still has to issue an official note for cryptocurrencies to be considered legal in India. 

Tax on Cryptocurrency in India
Tax on cryptocurrency is one of the most confusing investment aspects in India. In the beginning years, there was no income tax or goods and services tax (GST) on cryptocurrencies in India but in the recent Union Budget 2022, a tax regime for digital or virtual assets that include cryptocurrency has been introduced. 

Crypto investors are required to keep a well-calculated record of losses and gains as a part of their income.
On the earnings from the transfer of virtual or digital assets, a 30% tax will be charged. The tax includes cryptocurrencies, NFTs, etc.
Cost of acquisition along with no deduction will be permitted while reporting gains from the transfer of virtual or digital assets.
A tax of 1% on tax deducted at source (TDS) on the buyer’s payment if it crosses the threshold limit.
If someone receives cryptocurrency as a gift or it is transferred then it is subjected to tax at the beneficiary’s end. 
If investors face any loss from the virtual or digital asset investment, it cannot be recovered against other income.
"""

In [4]:
tokens = tokenizer.encode_plus(txt, add_special_tokens = False, return_tensors = 'pt')

print(len(tokens))

tokens

Token indices sequence length is longer than the specified maximum sequence length for this model (825 > 512). Running this sequence through the model will result in indexing errors


3


{'input_ids': tensor([[ 2054,  2024,  1996, 20502,  2015,  1997, 19888, 10085,  3126,  7389,
          5666,  1029, 19920,  1999, 19888, 10085,  3126,  7389,  5666,  2453,
          2298, 16004,  1998, 15282,  2021,  9387,  2323,  2036,  5136,  1037,
          2261, 12482,  8621,  2000,  2009,  1012, 19888, 10085,  3126,  7389,
          5666,  4447,  2000,  2022,  2019, 10812,  2433,  1997, 12598,  1010,
          2021,  2027,  2024,  2941, 13881,  3560,  2029,  2965,  2027,  2681,
          1037,  3617,  4446,  2008,  1996,  2976,  4879,  1997,  4812,  2064,
         21933,  3207,  1012,  2061,  1010,  2045,  1521,  1055,  1037,  6061,
          1997, 11099,  2013,  2976,  2030,  2231,  4614,  2000,  2650,  1996,
          3361, 11817,  1997,  3671,  2111,  1012,  2006,  1037,  3796, 24925,
          2078,  1010,  2045,  2003,  1037,  5377,  3891,  1997,  1037,  4868,
          1003,  2886,  2029,  2965,  2009,  2003,  1037,  3663,  2043,  1037,
         18594,  2030,  2177,  1997,  

In [5]:
len(tokens['input_ids'][0])

825

In [6]:
input_id_chunks = tokens['input_ids'][0].split(510)
attention_mask_chunks = tokens['attention_mask'][0].split(510)

In [7]:
len(attention_mask_chunks)

2

In [8]:
len(attention_mask_chunks[0])

510

In [9]:
input_id_chunks[0].shape

torch.Size([510])

In [10]:
def get_input_ids_and_attention_mask_chunk():
    """
    This function splits the input_ids and attention_mask into chunks of size 'chunksize'. 
    It also adds special tokens (101 for [CLS] and 102 for [SEP]) at the start and end of each chunk.
    If the length of a chunk is less than 'chunksize', it pads the chunk with zeros at the end.
    
    Returns:
        input_id_chunks (List[torch.Tensor]): List of chunked input_ids.
        attention_mask_chunks (List[torch.Tensor]): List of chunked attention_masks.
    """
    chunksize = 512
    input_id_chunks = list(tokens['input_ids'][0].split(chunksize - 2))
    attention_mask_chunks = list(tokens['attention_mask'][0].split(chunksize - 2))
    
    for i in range(len(input_id_chunks)):
        input_id_chunks[i] = torch.cat([
            torch.tensor([101]), input_id_chunks[i], torch.tensor([102])
        ])
        
        attention_mask_chunks[i] = torch.cat([
            torch.tensor([1]), attention_mask_chunks[i], torch.tensor([1])
        ])
        
        pad_length = chunksize - input_id_chunks[i].shape[0]
        
        if pad_length > 0:
            input_id_chunks[i] = torch.cat([
                input_id_chunks[i], torch.Tensor([0] * pad_length)
            ])
            attention_mask_chunks[i] = torch.cat([
                attention_mask_chunks[i], torch.Tensor([0] * pad_length)
            ])
            
    return input_id_chunks, attention_mask_chunks 

In [11]:
input_id_chunks, attention_mask_chunks = get_input_ids_and_attention_mask_chunk()

In [12]:
input_ids = torch.stack(input_id_chunks)
attention_mask = torch.stack(attention_mask_chunks)

input_dict = {
    'input_ids' : input_ids.long(),
    'attention_mask' : attention_mask.int()
}

input_dict

{'input_ids': tensor([[ 101, 2054, 2024,  ..., 1996, 2586,  102],
         [ 101, 5166, 1997,  ...,    0,    0,    0]]),
 'attention_mask': tensor([[1, 1, 1,  ..., 1, 1, 1],
         [1, 1, 1,  ..., 0, 0, 0]], dtype=torch.int32)}

In [13]:
input_dict['input_ids'].shape

torch.Size([2, 512])

In [14]:
input_dict['attention_mask'].shape

torch.Size([2, 512])

In [15]:
outputs = model(**input_dict)

probabilities = torch.nn.functional.softmax(outputs[0], dim = -1 )

mean_probabilities = probabilities.mean(dim = 0)

mean_probabilities

tensor([0.0269, 0.2417, 0.7314], grad_fn=<MeanBackward1>)

In [16]:
torch.argmax(mean_probabilities).item()

2