link : https://becominghuman.ai/text-summarization-in-5-steps-using-nltk-65b21e352b65

## Importing required tools 

In [1]:
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize 

In [2]:
def create_frequency_table(text_string) -> dict:
    """
        This function returns a dictionary for a given string (sentence)
        which contains the word as the key and the number of times it appears 
        the string as its values.
    """
    stopWords = set(stopwords.words("english"))
    words = word_tokenize(text_string)
    ps = PorterStemmer()
    
    freqTable = dict()
    
    for word in words:
        word = ps.stem(word)
        if word in stopWords:
            continue
        if word in freqTable:
            freqTable[word] += 1
        else:
            freqTable[word] = 1
    return freqTable
    

In [3]:
def score_sentences(sentences,freqTable) -> dict:
    """
    This function returns the score for each sentence according to how important words 
    it contains or how many words it contains from the dictionary table in the form of
    dictionary.
    """
    sentenceValue = dict()
    n = 15
    for sentence in sentences:
        word_count_in_sentence = (len(word_tokenize(sentence)))
        for wordValue in freqTable:
            if wordValue in sentence.lower():
                if sentence[:n] in sentenceValue:
                    sentenceValue[sentence[:n]] += freqTable[wordValue]
                else:
                    sentenceValue[sentence[:n]] = freqTable[wordValue]
        sentenceValue[sentence[:n]] = sentenceValue[sentence[:n]] // word_count_in_sentence
    return sentenceValue

In [4]:
def find_average_score(sentenceValue) -> int:
    sumValues = 0
    for entry in sentenceValue:
        sumValues += sentenceValue[entry]
    # Average value of a sentence from the original text
    average = int (sumValues / len(sentenceValue))
    
    return average

In [5]:
def generate_summary(sentences,sentenceValue,threshold) -> str:
    sentence_count = 0
    summary = ''
    n = 15
    
    for sentence in sentences:
        if sentence[:n] in sentenceValue and sentenceValue[sentence[:n]] > (threshold):
            summary += ' '+sentence
            sentence_count += 1
    
    return summary

In [6]:
def summarize(text,neg_threshold= 0) -> str:
    """
        neg_threshold is used to subtract the number from the calculated threshold if 
        it is too big.
    """
    freq_table = create_frequency_table(text)
    sentences = sent_tokenize(text)
    sentence_score = score_sentences(sentences,freq_table)
    threshold = find_average_score(sentence_score)
    summary = generate_summary(sentences,sentence_score,threshold - neg_threshold)
    return summary

In [7]:
text = """
Those Who Are Resilient Stay In The Game Longer
“On the mountains of truth you can never climb in vain: either you will reach a point higher up today, or you will be training your powers so that you will be able to climb higher tomorrow.” — Friedrich Nietzsche
Challenges and setbacks are not meant to defeat you, but promote you. However, I realise after many years of defeats, it can crush your spirit and it is easier to give up than risk further setbacks and disappointments. Have you experienced this before? To be honest, I don’t have the answers. I can’t tell you what the right course of action is; only you will know. However, it’s important not to be discouraged by failure when pursuing a goal or a dream, since failure itself means different things to different people. To a person with a Fixed Mindset failure is a blow to their self-esteem, yet to a person with a Growth Mindset, it’s an opportunity to improve and find new ways to overcome their obstacles. Same failure, yet different responses. Who is right and who is wrong? Neither. Each person has a different mindset that decides their outcome. Those who are resilient stay in the game longer and draw on their inner means to succeed.
I’ve coached many clients who gave up after many years toiling away at their respective goal or dream. It was at that point their biggest breakthrough came. Perhaps all those years of perseverance finally paid off. It was the 19th Century’s minister Henry Ward Beecher who once said: “One’s best success comes after their greatest disappointments.” No one knows what the future holds, so your only guide is whether you can endure repeated defeats and disappointments and still pursue your dream. Consider the advice from the American academic and psychologist Angela Duckworth who writes in Grit: The Power of Passion and Perseverance: “Many of us, it seems, quit what we start far too early and far too often. Even more than the effort a gritty person puts in on a single day, what matters is that they wake up the next day, and the next, ready to get on that treadmill and keep going.”
I know one thing for certain: don’t settle for less than what you’re capable of, but strive for something bigger. Some of you reading this might identify with this message because it resonates with you on a deeper level. For others, at the end of their tether the message might be nothing more than a trivial pep talk. What I wish to convey irrespective of where you are in your journey is: NEVER settle for less. If you settle for less, you will receive less than you deserve and convince yourself you are justified to receive it.
“Two people on a precipice over Yosemite Valley” by Nathan Shipps on Unsplash
Develop A Powerful Vision Of What You Want
“Your problem is to bridge the gap which exists between where you are now and the goal you intend to reach.” — Earl Nightingale
I recall a passage my father often used growing up in 1990s: “Don’t tell me your problems unless you’ve spent weeks trying to solve them yourself.” That advice has echoed in my mind for decades and became my motivator. Don’t leave it to other people or outside circumstances to motivate you because you will be let down every time. It must come from within you. Gnaw away at your problems until you solve them or find a solution. Problems are not stop signs, they are advising you that more work is required to overcome them. Most times, problems help you gain a skill or develop the resources to succeed later. So embrace your challenges and develop the grit to push past them instead of retreat in resignation. Where are you settling in your life right now? Could you be you playing for bigger stakes than you are? Are you willing to play bigger even if it means repeated failures and setbacks? You should ask yourself these questions to decide whether you’re willing to put yourself on the line or settle for less. And that’s fine if you’re content to receive less, as long as you’re not regretful later.
If you have not achieved the success you deserve and are considering giving up, will you regret it in a few years or decades from now? Only you can answer that, but you should carve out time to discover your motivation for pursuing your goals. It’s a fact, if you don’t know what you want you’ll get what life hands you and it may not be in your best interest, affirms author Larry Weidel: “Winners know that if you don’t figure out what you want, you’ll get whatever life hands you.” The key is to develop a powerful vision of what you want and hold that image in your mind. Nurture it daily and give it life by taking purposeful action towards it.
Vision + desire + dedication + patience + daily action leads to astonishing success. Are you willing to commit to this way of life or jump ship at the first sign of failure? I’m amused when I read questions written by millennials on Quora who ask how they can become rich and famous or the next Elon Musk. Success is a fickle and long game with highs and lows. Similarly, there are no assurances even if you’re an overnight sensation, to sustain it for long, particularly if you don’t have the mental and emotional means to endure it. This means you must rely on the one true constant in your favour: your personal development. The more you grow, the more you gain in terms of financial resources, status, success — simple. If you leave it to outside conditions to dictate your circumstances, you are rolling the dice on your future.
So become intentional on what you want out of life. Commit to it. Nurture your dreams. Focus on your development and if you want to give up, know what’s involved before you take the plunge. Because I assure you, someone out there right now is working harder than you, reading more books, sleeping less and sacrificing all they have to realise their dreams and it may contest with yours. Don’t leave your dreams to chance.
"""

In [8]:
summarize(text)

' To be honest, I don’t have the answers. However, it’s important not to be discouraged by failure when pursuing a goal or a dream, since failure itself means different things to different people. Same failure, yet different responses. Neither. Each person has a different mindset that decides their outcome. It was at that point their biggest breakthrough came. Perhaps all those years of perseverance finally paid off. It must come from within you. Problems are not stop signs, they are advising you that more work is required to overcome them. Most times, problems help you gain a skill or develop the resources to succeed later. And that’s fine if you’re content to receive less, as long as you’re not regretful later. So become intentional on what you want out of life. Commit to it. Nurture your dreams. Focus on your development and if you want to give up, know what’s involved before you take the plunge. Don’t leave your dreams to chance.'

In [9]:
text = """Long ago, there lived a woodcutter in a small village.  He was sincere in his work and very honest.  Every day, he set out into the nearby forest to cut trees.  He brought the woods back into the village and sold them out to a merchant and earn his money.  He earned just about enough to make a living, but he was satisfied with his simple living.

One day, while cutting a tree near a river, his axe slipped out of his hand and fell into the river.  The river was so deep, he could not even think to retrieve it on his own. He only had one axe which was gone into the river. He became a very worried thinking how he will be able to earn his living now!  He was very sad and prayed to the God. He prayed sincerely so the God appeared in front of him and asked, “What is the problem, my son?” The woodcutter explained the problem and requested the God to get his axe back.

The God put his hand deep into the river and took out a silver axe and asked, “Is this your axe?”  The Woodcutter looked at the axe and said “No”.   So the God put his hand back deep into the water again and showed a golden axe and asked, “Is this your axe?”  The woodcutter looked at the axe and said “No”.  The God said, “Take a look again Son, this is a very valuable golden axe, are you sure this is not yours?”  The woodcutter said, “No, It’s not mine.  I can’t cut the trees with a golden axe.  It’s not useful for me”.

The God smiled and finally put his hand into the water again and took out his iron axe and asked, “Is this your axe?”  To this, the woodcutter said, “Yes!  This is mine!  Thank you!”  The Goddess was very impressed with his honesty so she gave him his iron axe and also other two axes as a reward for his honesty."""

In [10]:
summarize(text,1)

' Long ago, there lived a woodcutter in a small village. He was sincere in his work and very honest. Every day, he set out into the nearby forest to cut trees. He earned just about enough to make a living, but he was satisfied with his simple living. One day, while cutting a tree near a river, his axe slipped out of his hand and fell into the river. The river was so deep, he could not even think to retrieve it on his own. He only had one axe which was gone into the river. He was very sad and prayed to the God. He prayed sincerely so the God appeared in front of him and asked, “What is the problem, my son?” The woodcutter explained the problem and requested the God to get his axe back. The God put his hand deep into the river and took out a silver axe and asked, “Is this your axe?”  The Woodcutter looked at the axe and said “No”. So the God put his hand back deep into the water again and showed a golden axe and asked, “Is this your axe?”  The woodcutter looked at the axe and said “No”. 

In [11]:
text = """
A blockchain may include a series of data blocks, the blocks including a code, such as a cryptographic hash or checksum, which may be coding-consistent with the content of previous blocks in the series. In some cases, determining multiple different sets of blocks that produce the same integrity code may be insoluble, prohibitively computationally complex, or otherwise effort intensive enough to frustrate attempts to tamper with the contents of the blockchain while maintaining the self-consistence of the integrity codes. However, in some implementations a trusted party may have access to a key secret, or portion of a key secret, such that the party, acting alone or with those in possession of the other portions of the key secret, may edit the blockchain contents without leaving indication of tampering.
In various systems multiple parties may use a blockchain-based file or ledger to maintain a tamper-evident record of events, transactions, or other updates. In some cases, a blockchain may register tampering after a change made to the blockchain by an untrusted party, for example a party not in possession of the key secret. Thus, the parties may individually verify that updates by other parties are valid and coding-consistent with the previous data blocks of the blockchain. The self-consistence of the integrity codes allows the updates to the blockchain to be verified even if the party lacks an archived version of the blockchain to use as a reference. When a rewrite to one or more data blocks in a blockchain does not introduce coding-inconsistency among the integrity outputs and data block contents of the blocks in the blockchain, the rewrite may be characterized as preserving the validity of the blockchain.
A blockchain may be secured by an integrity code. An integrity code may produce a particular integrity output when particular data is provided as input to the integrity code. In some cases, when data different than the particular data is provided to the integrity code as input, the integrity code may produce a different integrity output. In an example scenario an integrity output from the integrity code generated from particular input data from a data block is stored and the data block is later changed. If the changed data is provided to the integrity code as input, the integrity code may produce an integrity output that is different or otherwise coding-inconsistent with the stored integrity output. Therefore, the change may be detected in this example scenario.
A blockchain may include a series of blocks where each subsequent block in the series holds the integrity output for a previous block. The series may form a chain of blocks in which each subsequent block holds an integrity output generated from the data present in the immediately prior block. Accordingly, if a block is changed, a coding-inconsistency with the integrity output stored in a subsequent block may be detected. Since the integrity outputs are part of the stored data in the blocks, changes to the integrity outputs themselves may also be detected through coding-inconsistencies. This self-consistency of the integrity code may be used to secure a blockchain with respect to covert tampering.
When secured by an integrity code, a tamper-evident change may include virtually any change for which a coding-inconsistency between the integrity outputs of the integrity code for a blockchain and the data within the blockchain can be detected. For example, the data in a block of the blockchain may be hashed, run through a checksum, or have another integrity code applied. If the data in the block is later found to conflict with the integrity output of the hash, checksum, or other integrity code, the change may be identified as tamper-evident. A conflict may occur when the data currently in a block does not produce an identical or equivalent integrity output to the earlier obtained integrity output when the integrity code is applied to the data currently in the block. When a change is made to a block and no coding-inconsistency with the previously stored integrity outputs of the integrity code can be detected afterward, that change may be non-tamper-evident. In some cases, a non-tamper-evident rewrite may be implemented by substituting a first block with a second block with different data content that produces the same (or an equivalent) integrity output.
In some cases, after entry, some blocks in a blockchain may include information that is no longer appropriate for inclusion in the blockchain. For example, blocks may expire after time or after a determined number of subsequent entries, private information may be included in the blocks, inaccurate entries may be included in the blocks, information prejudicial to one or more of the parties using the blockchain may be included in the blocks, incomplete information may be included, or other inappropriate information may be included. Accordingly, a trusted party, for example a neutral third party, a governing party, or a group of individually untrusted parties, may rewrite, remove, or supplement data included in the blocks in a non-tamper-evident fashion. The systems and techniques described below implement technical solutions for rewriting blocks in the blockchain to allow trusted parties to redact information from the blockchain, without causing the blockchain to fail for its intended purpose. For example, the parties may use a modified blockchain as if it was the earlier, and unmodified, blockchain.
Blockchain rewrites may be used to perform low level (e.g., from a hardware architecture standpoint) operations such as memory rewrites, deletions, and additions. Accordingly, the techniques and architectures may improve the operation of the underlying hardware of a computer system because the system may utilize blockchain protocols for storing data for which verifiability is implemented. For example, operating system software for secure systems may be stored in blockchain payloads to protect the data from manipulation by malware, unauthorized parties, unauthorized devices, or other unintended/unauthorized alterations.
Additionally or alternatively, blocks may represent a smallest increment of data that may be distributed when an update is made. For example, one or more updated block may be sent separately from the entire blockchain during an update. However, in some cases, at least the entire blockchain may be distributed with individual valid updates. For example, when a new secured transaction is performed and added to a ledger secured via a blockchain, the entire blockchain (e.g., full transaction history) may be re-distributed with the updated transaction added. Blockchain rewrite systems, such as exemplary implementations described herein, that allow truncation, right-sizing, extension, or other blockchain size adjustments may improve the operation the underlying hardware by allowing adjustment of the data overhead consumed during blockchain update and distribution.
In addition, the ability of a trusted party to rewrite a blockchain may improve tamper-resistance by providing an established rewrite solution. Accordingly, rather than having to jettison a blockchain due to inappropriate content, a trusted party may rewrite the existing blockchain. Accordingly, blockchain rewrite dramatically improves system efficiency, compared to recreating a new blockchain. Blockchain rewrite may also reduce the probability of a malicious party using a defunct blockchain, which may have been discarded due to inappropriate content, to spoof a system by notifying the system that it did not receive a prior notification of the blockchain discard. Accordingly, the rewritable blockchain may have the technical effect of improved data security and tamper-resistance. In other words, the techniques and architectures discussed herein comprise concrete, real-world applications of and improvements to existing technologies in the marketplace.
Further, the techniques and architectures, including those for rewritable blockchains, distributed key secrets, dual-link blockchains, loops, and other techniques and architectures discussed require one to proceed contrary to accepted wisdom. In particular, conventional approaches to blockchain distributed databases require immutability of the blockchain as a foundational feature. Expressed another way, immutability has been repeatedly explained in prior work as an essential feature in establishing the technological value of a blockchain. Immutability in blockchains has been incorrectly viewed and dictated as the required way to ensure that parties using a blockchain trust the validity of the data contained in the blockchain. Accordingly, the techniques architectures described here that add rewritability to a blockchain proceed contrary to accepted wisdom. The present techniques and architectures proceed contrary to accepted wisdom by introducing rewritability, while still maintaining high security, and therefore the high technological value of the blockchain. As such, despite the significant departures of the present techniques and architectures from prior teachings, the present techniques and architectures provide high levels of trust in the blockchain despite its mutability.
FIG. 1 shows example two example views 100, 150 of a blockchain where each subsequent block includes an integrity code (e.g., a hash, chameleon hash, or other integrity code) using the previous block as an input. For instance, block B1 104 includes a integrity output, IC(B0), in this integrity output field 124 determined from content of previous block B0 102 serving as input to the integrity code. The content of B0 102 used in determination of IC(B0) may include any or all of the fields within B0, such as Data 00 121, the [null] integrity output field 122, or the BlockiD 123. The data fields (e.g., Data 00 121, Data 10 131, and other data fields) of the blocks may be used to store any type of data. For example, the blockchain data fields may hold account data, personal data, transaction data, currency values, contract terms, documents, version data, links, pointers, archival data, other data, or any combination thereof.
The fields in a block that are not used to determine the integrity output in a subsequent block may not necessarily be secured by the blockchain. For example, these fields may be altered without generating coding-inconsistencies among the blocks. Further, if any integrity output field is not used in the determination of the integrity output for a subsequent block in the chain, the blockchain may not necessarily ensure the coding-consistency among blocks discussed above because the unsecured integrity output may be changed without necessarily generating evidence of tamper. Accordingly, in various implementations, the integrity output field and at least a secured portion of a data payload of a block are used in determination of the integrity output for a subsequent block (e.g., the next block) in the blockchain. Similarly, IC(B1) in the integrity output field 125 of block B2 106 may be based on fields within Block B1 104, including, for example, any of the integrity output field 124, the data fields, or the BlockiD field of block B1 104. In the example, the integrity code, IC, may be a chameleon hash, as discussed below.
The blockchain blocks may be locked 152 to one another via the integrity codes. In one sense, the blocks are locked to one another because the integrity code output fields in each of the blocks are based on the content in the previous block at the time the integrity output was generated (e.g., when the block was added to the chain). Accordingly, if a previous block changes after a current block is added, the change will be tamper-evident because the change will be coding-inconsistent with the integrity output stored in the current block. Hence, the content of the previous block is “locked-in” once a current block with a stored integrity output based on the previous block is added to the blockchain. In the example blockchain in FIG. 1, the content of B1 104 may be locked once B2 106, which contains IC(B1) in its integrity output field, is added to the blockchain. As a result, the content of B0 102 which was locked by B1 104 is further secured by B2 106 because B2 106 prevents B1 104 from being changed in a non-tamper-evident manner.
In an example scenario, the rewritable blockchain may be implemented using chameleon hash as the integrity code, as discussed below. However, virtually any code may be used for which tampering is self-evident for parties not in possession of a key secret allowing editing.
FIG. 2 shows two example rewrites 200, 250 to the example blockchain of FIG. 1. In the first example 200, the block B2 202 is replaced with a block B2′ 204 with new content The new block B2′ 204 includes content generated using the key secret such that the integrity output generated when using block B2′ 204 as input is the same as that using original block B2 202 as input. For example, IC(B2)=IC(B2′).
In the second example 250, the block B2 202 is removed. Block B1 206 from the original chain may be replaced with block B1′ 208 to be coding-consistent with the integrity output contained in block B3 210. For example, the block B1′ 208 may include content generated using the key secret such that the updated block B1′ 208 may appear to be the correct block (and is a correct block in terms of the blockchain integrity code) to precede subsequent block B3 210. That is, B1 is replaced after deletion of block B2 so that B1′ can immediately proceed B3 without any violation of the blockchain integrity code.
In various implementations, different rewritable blockchains may have different key secrets. Thus, a trusted party able to rewrite a given blockchain may not necessarily be able to act as a trusted party and rewrite a second, different, blockchain. Using different key secrets for different blockchains may prevent multiple blockchains from being compromised simultaneously through the disclosure of a single key secret. However, multiple blockchains using the same “master” key secret may be generated by blockchain systems (e.g., a key secret may be a master key secret if it may be used with multiple different blockchain systems). Using a common secret among multiple blockchains may allow for more streamlined administration than using different key secrets for the different blockchains.
Additionally or alternatively, a blockchain may have multiple different key secrets that allow non-tamper-evident editing. In an example scenario, a master key secret may be used with multiple blockchains each with individual key secrets that do not necessarily allow non-tamper-evident editing on the other blockchains covered by the master key secret. For instance, blockchains A, B, and C may all allow rewrite with master key secret MK. Further, blockchain A may have an individual rewrite key secret A1, blockchain B may have an individual rewrite key secret B1, and blockchain C may have an individual rewrite key secret C1. In this example, a processing system may rewrite blockchain B using MK or B1, but not with A1 or C1.
Further, in some implementations, a granting key secret may be used to issue key secrets to trusted parties. For example, encrypted cache EC may include additional key secrets for blockchains A, B, and C (e.g., key secrets A2 . . . AN, Bn . . . Bn, C2 . . . Cn). A trusted party in possession of a granting key secret GK to decrypt EC and allow issuance of the stored keys to new trusted parties. In some cases, a master key secret may double as a granting key secret. For example, processing systems may use master key secret to generate block content for rewriting, and the master key secret may serve as a decryption key for an encrypted cache of key secrets.
In addition, distributed key schemes discussed below, may be applied for granting key secrets and master key secrets. In some systems, trusted parties may individually perform rewrites to the blockchain. However, the same trusted parties may combine, using any of the distributed key schemes discussed below, their keys to gain the authority associated with a granting key or master key. For example, three individually trusted parties may each perform rewrites without the assent of the other parties. However, the three parties may be forced to combine their key secrets, e.g., coordinate, to gain granting privileges and grant a fourth party its own key secret.
In various implementations, increased privileges may be obtained through coordination of a specified threshold number of parties, by specific pre-determined parties, by parties of a given class, by all parties, or by another defined group of parties. The distributed key secret scheme may determine the participation level rules for coordination.
In various implementations, keys secrets may be assigned to operators using key secret assignment schemes. The key secret assignment schemes may include assignment schemes based on operator identity, association, priority, or other basis.
In some cases, the blockchain is flagged to indicate that it is subject to editing. The flags or fields indicating that the blockchain is rewritable may identify the trusted parties with authority to rewrite the blockchain. This may assist parties with an interest in rewriting the blockchain in identifying the trusted parties able to perform the rewriting. For example, a blockchain may be accompanied by metadata describing the purpose, original, operational parameter, or other information on the blockchain. Flags for rewriting may be incorporated within the metadata. However, when such metadata is included outside of the blockchain it may be changed without evidence of tampering. Allowing the metadata to be changed freely may reduce computing resources needed to perform an edit and increase the number of parties that may correct a metadata error. In other systems, processing systems may write such metadata into the blocks of the blockchain itself, for example, into dedicated fields or the data payload of blocks. Writing the metadata to the blockchain itself may prevent unauthorized parties from altering blockchain metadata (e.g., for potentially malicious purposes).
In some implementations, the existence of the trusted parties may be kept secret from the untrusted parties or a portion of the trusted parties. In some cases, the integrity code may not necessarily provide an indication by inspection of its operation that trusted parties may edit entries in the blockchain. That is, the algorithm that generates the integrity code does not itself easily reveal that it supports blockchain rewrite. Keeping the existence of the trusted parties in confidence may discourage parties from attempting to steal or otherwise acquire the trusted party's key secret. Further, parties may have increased confidence in the blockchain if the parties assume that the blockchain cannot be edited by another party without the tampering being evident.
In some implementations, entities with knowledge of a key secret may make alterations to the blockchain. This key secret could be in the possession, in whole or in part, of operators, a centralized auditor, or other parties. Additionally or alternatively, shares (e.g., portions) of the key could be distributed among several individually untrusted parties. The integrity code may be a virtual padlock on the link connecting two blocks.
The key secret to open the virtual padlock can be managed according to the requirements of specific applications. For example, in a business negotiation (or government treaty negotiations) a key secret allowing alteration of proposed contract (treaty) terms may be held by neutral third party. Additionally or alternatively, equal portions (e.g., halves, thirds) of the key secret may be held by each party in the negotiation, such that terms may be altered with the consent of all parties or a defined plurality of the parties. In collaborative software design implementations, key secrets may be distributed in portions to stakeholders to enforce consensus before allowing alteration to certain software code. Below, example key secret distribution schemes are discussed, including centralized and distributed schemes. However, other schemes are possible.
FIG. 3 shows an example blockchain processing system (BPS) 300. The BPS 300 may include system logic 314 to support verifications of and rewrites to blockchains. The system logic 314 may include processors 316, memory 320, and/or other circuitry, which may be used to implement the blockchain processing logic 342. The memory 320 may be used to store blockchain metadata 322 and/or blockchain data 324 used in blockchain rewrites and block additions.
The memory may further include program instructions that implement blockchain processing, and one or more supporting data structures, for example, coded objects, templates, or other data structures to support verification of updates to blockchains and detect evidence of tampering. The memory may further include flags 323 which may indicate whether particular blockchains can be edited. In an example, the flags 323, may be implemented using a bit within specific fields within a blockchain or blockchain metadata to indicate editability. Further, the memory 320 may include parameter fields 326 that may include the identities of contact information of the trusted parties, for example, names, addresses, phone, email, or other contact information.
The BPS 300 may also include one or more communication interfaces 312, which may support wireless, e.g. Bluetooth, Wi-Fi, wireless local area network (WLAN), cellular (third generation (3G), fourth generation (4G), Long Term Evolution Advanced (LTE/A)), and/or wired, ethernet, Gigabit ethernet, optical networking protocols. The communication interface 312 may support communication with other parties making updates to blockchains or performing blockchain transfers. The BPS 300 may include power management circuitry 334 and one or more input interfaces 328. The BPS 300 may also include a user interface 318 that may include man-machine interfaces and/or graphical user interfaces (GUI). The GUI may be used to present data from blockchain-based verifications to an operator of the BPS 300. The user interface 318 may also render GUIs with tools to support block additions to blockchains.
FIG. 4 shows an example blockchain rewriting system (BRS) 400. The BRS 400 may be used by, for example, a trusted party performing redaction, revision, or supplementation on a blockchain. For example, a supplementation may include adding content to an existing block. Even in blockchains that do not support non-tamper-evident rewrites, an authorized operator may add a new block, e.g., a new transaction record, to the blockchain. However, alterations to existing blocks (including additions) may generate evidence of tamper unless performed by a trusted party in possession of a key secret. The BRS 400 may include system logic 414 to support verifications, updates, and rewrites to blockchains. The system logic 414 may include processors 416, meory 420, and/or other circuitry, which may be used to implement the blockchain processing logic 442 and the rewrite management logic (RML) 441.
The memory 420 may be used to store blockchain metadata 422 and/or blockchain data 424 used in blockchain rewrites and block additions. The memory 420 may further store key secrets 421, such as an encryption key value, trapdoor information, or other secret value, that may allow non-tamper-evident rewriting of a blockchain. In some cases, the key secrets 421 may be stored in protected memory 480, such as encrypted files or data drives, physically secured drives, drives coupled to triggers for anti-theft countermeasures, or self-deleting drives to prevent accidental or surreptitious disclosure of the stored key secrets 421. The memory storing key secrets may include trusted memory or other memory in possession of or controlled, either directly or indirectly, by a trusted party.
The memory 420 may further include applications and structures, for example, coded objects, templates, or one or more other data structures to support verification of updates to blockchains and detect evidence of tampering. The memory may further include flags 423 which may indicate whether particular blockchains can be edited and the identities of the trusted parties. The BRS 400 may also include one or more communication interfaces 412, which may support wireless, e.g. Bluetooth, Wi-Fi, WLAN, cellular (3G, 4G, LTE/A), and/or wired, ethernet, Gigabit ethernet, optical networking protocols. The communication interface 412 may support communication with other parties making updates to blockchains or performing blockchain transfers. Additionally or alternatively, the communication interface 412 may support secure information exchanges, such as secure socket layer (SSL) or public-key encryption-based protocols for sending and receiving key secrets between trusted parties. Further, the secure protocols may be used to combine key secrets among individually untrusted parties each having some portion of a key secret, as discussed below. The BRS 400 may include power management circuitry 434 and one or more input interfaces 428.
The BRS 400 may also include a user interface 418 that may include man-machine interfaces and/or graphical user interfaces (GUI). The GUI may be used to present data from blockchain-based verifications to an operator of the BRS 400. Additionally or alternatively, the user interface 418 may be used to present blockchain rewriting tools to the operator.
In some cases, the user interface 418 may include a GUI with tools to facilitate blockchain rewrites and deletions. The GUI tools for rewriting may include “what you see is what you get” tools that allow operators to manipulate the content of the blockchain, e.g., using word-processor-like tools, web-editing-like tools, file-browsing-like tools, or any combination thereof. Additionally or alternatively, the user interface 418 may include command-line editing tools. The tools, whether text or graphic based, may allow operators to access key secrets and perform edits on blockchains for which they are authorized. In some cases, the tools may deny writing capabilities to operators lacking the proper key secret for the blockchain that they are attempting to edit. However, in some implementations, the tools may allow such unauthorized editing because it will result in tamper-evident rewrite that will invalidate the unauthorized edits to the blockchain.
FIG. 5 shows example RML 441, which may be implemented in or with circuitry. The RML 441 may handle management of key secrets and implementation of rewrite commands. For example, the RML 441 may determine availability of key secrets for particular blockchains and pass those key secrets to the rewrite logic 600 (discussed below) for execution of the rewrites. The RML 441 may also handle reception of rewrite commands or reception of commands for the automation of blockchain rewrites. Once, the RML 441 identifies the change requested and the blockchain involved, the RML 441 may access the blockchain (502).
The RML 441 may determine whether the memory 420 of the BRS 400 holds a key secret allowing rewrites to the accessed blockchain (504). If the memory 420 does not store the key secret, the RML 441 may determine whether the key secret is accessible via secure communication or via secure combination of portions of the key secret using the communication interface 412 (506). For example, the portions may include portions of a key secret held by parties that individually are untrusted, but as a group, with their portions combined into a full key secret, form a trusted party. In some implementations, the key secret or portion thereof may be accessed via a secure communication using communication interface 412, e.g., to protect against interception of the key secret during communication. If the key secret cannot be accessed, the RML 441 may indicate, via the GUI 418 that non-tamper-evident rewrites to the blockchain are not available (508). If the key secret is accessible, either in memory or via secure communication, the RML 441 may prompt the operator for rewrites to the blockchain (510).
Additionally or alternatively, the RML 441 may automatically obtain rewrites (511). For example, rewrites may be available from a rewrite queue, embedded within a previously received command, obtained from other blockchains, determined from content identified by the systems as malicious code or other inappropriate content, or other rewrites automatically obtained by the RML 441. The rewrites may be stored as a command identifying changes to be made to one or more blocks and, if content is to be added by the change, content to be written to the blocks. The command itself may include the content to be written or, alternatively, may include a pointer to location of the content. The RML 441 may call rewrite logic 600 (see FIG. 6) to perform the rewrites (512). For example, when non-tamper-evident rewrites are available, the BRL 441 may call rewrite logic 600 to execute the rewrites to the block. FIG. 6 shows example rewrite logic 600, which may be implemented in or with circuitry. The rewrite logic 600 may access a blockchain (602). For example, the rewrite logic 600 may access memory where a blockchain is stored. Additionally or alternatively, the rewrite logic 600 may access a blockchain via a networking communication interface (e.g., communication interfaces 412). In some cases, the rewrite logic 600 may access the blockchain using a secured connection or on secured memory as discussed above.
The blockchain may include one or more data blocks that are secured by an integrity code. For example, a rewrite-protected cryptographic hash function, such as a hash function without a key secret for allowing non-tamper-evident rewrites, a chameleon hash, cyclic redundancy checks (CRCs), checksums, or other integrity codes may be used to secure the data blocks within the blockchain. In some implementations, the individual data blocks may be secured by a particular integrity output that is coding-consistent with the data content of the block. For example, an integrity output may be coding-consistent with the content of block when applying the integrity code to the contents of the block that produces that integrity output. When an integrity output is coding-consistent with the data that it secures, the data may be deemed valid. As discussed above, that particular integrity output may be placed within a neighboring block to prevent or frustrate attempts to rewrite the data content in a non-tamper-evident fashion. Further, as discussed below with respect to hybrid blockchains, some blockchains may include portions (e.g., of individual blocks or groups of blocks) that may allow for non-tamper-evident rewrites alongside portions that may not necessarily allow for non-tamper-evident rewrites by trusted parties.
The rewrite logic 600 may access a key secret, such as a cryptographic key or trapdoor information, that is paired to the integrity code of the blockchain (604). The key secret may include data that allows a system, e.g., the BRS 400, to compute collisions, e.g., two different data blocks that produce the same integrity output for the integrity code. Using the computed collisions, a device may rewrite the blockchain without the rewritten blocks being coding-inconsistent with the integrity code. For example, an operator may instruct a BRS 400 compute a collision using a key secret and rewrite a blockchain.
The rewrite logic 600 may receive a command, e.g., from the RML 441, to perform a rewrite on the blockchain (606). For example, the command may have been received from an operator for a trusted party that wishes to replace or delete data (e.g., content) from a particular block. The operator may indicate, e.g., in a command issued through a man-machine interface to the BRS 400, the original data and the replacement data from input devices of a user interface. Additionally or alternatively, commands to replace data may be received via a network communication interface, for example from a terminal associated with the trusted party. The rewrite logic 600 may receive the command to perform the rewrite from the RML 441. Further commands to perform rewrites may originate from automated sources such as those describe above with respect to the RML 441.
The rewrite logic 600 may process the key secret, the replacement data, and the original data to determine additional data for which the replacement data and the additional data produce the same integrity output for the integrity code that is produced for the original data (608). Accordingly, the replacement data and additional data may supplant the original data without necessarily creating evidence of tampering. In an example scenario, where the integrity code is a chameleon hash, the key secret for the chameleon hash allows the rewrite logic 600 to determine collisions for virtually any original data content. In this example scenario, using the key secret, the rewrite logic 600 may compute additional data that produces the same hash output as any given original data when combined with replacement data selected by a trusted entity.
A deletion operation may be executed in the same or similar fashion as other rewrites. However, rather than selecting a replacement data and additional data to be coding-consistent with neighboring blocks (e.g., blocks immediately subsequent or immediately prior in the blockchain). The replacement data and additional data may be selected to be coding-consistent with other blocks further up or down the blockchain. For example, if the replacement data of the rewritten block collides with the data of a subsequent block further down the blockchain (e.g., non-adjacent blocks) rather than that of the block that is being replaced, one or more subsequent blocks (e.g., one or more consecutive blocks in the blockchain immediately following the rewritten block) may be removed. Additionally or alternatively, if the integrity output field in the replacement data includes an integrity output of a block that is two or more blocks prior to the block being replaced, one or more blocks prior to the block being replaced may be deleted. Accordingly, when a rewrite includes a deletion, the rewrite logic 600 may delete one or more blocks prior to or subsequent to the block being rewritten (609).
Once the rewrite logic 600 determines the proper additional data, the rewrite logic 600 may generate the additional data (610) and combine the additional data with the replacement data (612). In some implementations, particularly in schemes where the rewritability of the blockchain is kept confidential, the existence of the additional data may be masked. Thus, a party not in possession of the key secret would not be able to immediately identify the rewritable blockchain as rewritable simply by noting the existence of the additional data.
For example, the additional data may be placed in a field within the blocks that contains data with another identified purpose. For example, the additional data may be appended to integrity output fields or to “randomness” fields as discussed below with regard to FIG. 8.
However, in some cases, declining to expressly identify a specific purpose for the additional data, which would be otherwise incomprehensible, may be sufficient to prevent untrusted operators from developing a suspicion that the blockchain that they are using is a rewritable blockchain.
In various implementations, the chameleon hash may be identifiable by both trusted and untrusted parties to facilitate verification of block content.
The rewrite logic 600 may then write replacement data combined with the additional data in place of the original data (614). For example, the rewrite logic 600 may overwrite the original data with the combined replacement and additional data. Because the combined data is coding-consistent with the integrity output of the original data, the overwrite of the original data may be performed in a non-tamper-evident manner, at least with regard to the integrity code. In other words, the rewrite may be non-tamper-evident, even if replacing the original data with the replacement data alone would result in a tamper-evident rewrite. As discussed below, dual-chain and multiple chain blockchains may be used in some implementations. Accordingly, rewriting a blockchain coding-consistently with a first integrity code of the blockchain may not necessarily result in a fully non-tamper-evident rewrite.
The replacement data may include completely rewritten data; an altered version of the original data, such as a redacted version of the original data; the original data with additions; a complete deletion of the original data; or other data.
The techniques and architectures discussed herein allow rewrites to the content of a blockchain that may be implemented in services, such as decentralized services, that exploit blockchain-based technologies. Non-tamper-evident, validity preserving, or other type of rewrites to a blockchain may be used in various scenarios. The scenarios, for example, may include removing improper content from a blockchain, providing support for applications that use rewritable storage, complying with governmental regulations such as to “the right to be forgotten”, or other scenarios.
The techniques and architectures, including those for rewritable blockchains, distributed key secrets, dual-link blockchains, loops, and other techniques and architectures discussed herein may be used in conjunction with various blockchain consensus techniques. For example, in some cases, rewritable blockchains may be used with proof of work based consensus mechanisms. Accordingly, operators, e.g., untrusted operators, may be granted the ability to append a block to the rewritable blockchain upon finding a solution of a pre-defined challenge and showing proof of work for the solution. In some implementations, consensus mechanisms based on “practical Byzantine fault tolerance” may be implemented. Further, some implementations may use “smart contract” type consensus mechanisms where operators may append blocks upon a showing of compliance with the terms or rules of the smart contract. Integrity codes may be implemented independently of the particular consensus mechanism used in a blockchain. Accordingly, integrity code, include integrity code supporting blockchain rewrites, may be implemented with virtually any blockchain consensus mechanism.
In some implementations, chameleon hash functions, which may allow for efficient determination of hash collisions when given a key secret, may be used by the system e.g., BRS 400. In some cases, the system may use a chameleon hash to grant a trusted entity, multiple individually untrusted parties that together makeup a trusted party, or other entity the ability to make non-tamper-evident rewrites to a blockchain.
In some implementations, a hash function may remain collision resistant even after polynomially many collisions have been already found (using the key secret). This property may be called key-exposure freeness. As discussed below, a transformation may be used to convert a chameleon hash function into one additionally satisfying key-exposure freeness.
FIG. 7A shows two example collision searches 700, 720. For a hash function lacking a key secret (H), collisions may be difficult to find. Accordingly, finding X and X′ such that H(X)=H(x′) may be prohibitively difficult (700). However, for a chameleon hash CH, an device in possession of the key secret 722 may be able to find X and X′ such that CH(X)═CH(X′) (750).
FIG. 7B shows an example rewrite to a blockchain 760 using a collision. Blockchain 760 includes blocks 762, 764, 766, and 768. Block 766 includes integrity output 784. When two different blocks 766, 770 with different content produce the same integrity output 786 for an integrity code, the blocks 766, 770 are a collision for the integrity code (796). Block 766 may be replaced with block 770 and maintain coding-consistence with subsequent block 768 because blocks 766 and 770 produce the same integrity output. However, if block 770 does not contain the proper integrity output (e.g., integrity output 784), block 770 will not be coding-consistent with block 764. With access to the key secret of an integrity code, a party is able to specify the integrity output present in block 770 (797). Accordingly, block 770 can be made coding-consistent with block 764 by including integrity output 784 (798). Block 770 is still coding-consistent with block 768 because Block 770 collides with block 766. Alternatively, if Block 770 is instead constructed to include integrity output 782, the insertion of Block 770 may be used to delete block 764 (799). With integrity output 782, block 770 is coding-consistent with block 762 (as its preceding block) and block 768 (as the block immediately subsequent). Accordingly, block 764 may be removed from the blockchain with no evidence of tamper.
In some real-world applications, an append-only ledger for the majority of parties (to preserve security) that allows rewriting may be implemented. To implement the real-world application, rewriting may be constrained such that it may be performed by trusted parties or in defined circumstances. Two examples of real-world applications are discussed below with regard to FIGS. 13 and 14.
In some cases, applications such as smart contracts or overlay applications may not necessarily work and scale if the blockchain may not be edited in a validly-preserving or non-tamper-evident fashion. A smart contract may include a sequence of instructions, for example computational instructions, that a party performs in exchange for compensation.
Further, rewritable blockchains may provide support for updates to the application that the blockchain is used to secure. If a blockchain-based system is overhauled after inception, a rewritable blockchain may be used to rebuild the blockchain to reflect to overhaul.
Notation
For a string x, its length may be denoted by |x|; if X is a set, |X| may represent the number of elements in X. When x is chosen randomly in X, the selection may be denoted as x←$ X. When A is an algorithm, y←$ A(x) may denote a run of A on input x and output y; if  is randomized, then y is a random variable and A(x; r) may denote a run of A on input x and randomness r. An algorithm A is a probabilistic polynomial-time (PPT) if A is randomized and for any input x, r∈{0, 1}*. The computation of A(x; r) may terminate in after up to poly (|x|) steps.
A security parameter may be denoted as κ
Figure US10348707-20190709-P00001. A fuction v: Figure US10348707-20190709-P00002→[0, 1] may be negligible within the security parameter (or simply negligible) if it vanishes faster than the inverse of any polynomial in κ, i.e. v(κ)=κ−ω(1). For a random variable X, Figure US10348707-20190709-P00003[X=x] may denote the probability that X takes on a particular value x∈X (where X is the set where X is defined). Given two ensembles X={Xκ}κ∈N and Y={Yκ}κ∈N, X≡Y may denote that the two ensembles are identically distributed, and X≈cY may denote that the two ensembles are computationally indistinguishable, for example, for a given scenario.
Public-Key Encryption
A Public-Key Encryption (PKE) scheme i a technique by which information may be exchanged publicly between two or more parties without necessarily disclosing encryption keys, key secrets, or other secrets publicly. Further, PKE may be achieved without necessarily requiring full disclosure of key secrets or other secrets among the parties in the exchange. In an implementation, a PKE may be executed using a tuple of algorithms PKE=(KGen, Enc, Dec) defined as follows: (1) The probabilistic algorithm KGen takes as an input the security parameter κ∈
Figure US10348707-20190709-P00004, and outputs a public/secret key pair (pk, sk). (2) The probabilistic algorithm Enc takes as an input the public key pk, a message m∈M, and implicit randomness ρ∈Rpke, and outputs a ciphertext c=Enc(pk, m; ρ). The set of all ciphertexts is denoted by C. (3) The deterministic algorithm Dec takes as an input the secret key sk and a ciphertext C∈C and outputs m=Dec(sk, c) which is either equal to some message m∈M or to an error symbol ⊥.
In some cases, PKE or other secure exchange schemes may be used by individually-untrusted parties to combine portions or shares of a key secret of the integrity code to generate a full key secret capable of non-tamper-evident rewrites of a blockchain. In some cases secure exchange schemes may be used to ensure that third parties are unable to acquire the portions of the key secret by observing the exchange. Additionally or alternatively, secure exchange schemes may be used by individually untrusted parties to prevent other individually untrusted parties from acquiring multiple portions during an exchange. For example, in an unsecured exchange, once an individually untrusted party collects the portions from the other untrusted parties, the collecting party irrevocably becomes a trusted party. However, in a secure exchange, such as how PKE is implemented, an untrusted party may collect portions of the key secret from the other untrusted parties without actually learning the content of the collected individual portions. Accordingly, collecting portions of the key secret from other individually untrusted parties does not necessarily result in the collecting party becoming a trusted party. Thus, the individually untrusted parties may together makeup a trusted party, but after expiration or other invalidation of a combined key, the individually untrusted parties return to their separate untrusted status until they again agree to combine their individual portions. In some cases, a combined key may expire after a pre-determined period of time or after performing a pre-determined volume of rewrites. For example, the combination process may specify a pre-determined expiration parameter which may delineate a number of rewrites, a number of blocks that may be rewritten, a duration, a volume of data that may be altered, a specific listing of blocks that may be rewritten, one or more event occurrences, or a combination thereof.
In other cases, the key may be combined in such a way that the parties working together can determine the additional content used to perform a non-tamper-evident rewrite of the block. However, no single party necessarily collects a complete (but encrypted) key such that no single party could determine the additional content on behalf of the other parties. Rather, each individually untrusted party within a group that makes up a trusted party may calculate a portion of the additional content (or perform some portion of the processing). An end result from the combined efforts of the individually untrusted parties serves as the additional content to support the non-tamper-evident rewrite of a single block. For any subsequent rewrites, the individually untrusted parties may cooperate again for each specific block that is designated for rewriting by the group that makes up the trusted party.
The individually untrusted parties may be different operators (e.g., entities, institutions, devices, or other parties) with different operator profiles on a single system. Additionally or alternatively, individually untrusted parties may be distributed over multiple systems. The individually untrusted parties may store their respective portions of the key secret in different memory locations, which may have the same or different security features. The individual memory locations may be associated with individual ones of the individually untrusted parties. For example, the memory locations may correspond to a storage device owned or maintained by the respective ones of the individually untrusted parties. Similarly, trusted parties may maintain associated memory locations. In some cases, the memory location may serve as an identifier (whole or in part) of a party. For example, memory location for a trusted party may be used to confirm that the key secret is being controlled (e.g., access control, read/write control, or other control) by a proper trusted party. For example, a key secret may be rejected by the system if it is not accessed from a trusted memory location (e.g., a memory location used, indirectly controlled, maintained, or owned by a trusted party). Similarly, portions of key secret held by untrusted parties may be tied to specific memory locations.
Example implementations that may be used to support the techniques and architectures described above are described below. For example, the implementations discussed below may be used to construct chameleon hashes. However, other integrity codes may be used to non-tamper-evident blockchain rewrites
"""

In [12]:
summarize(text)

' Therefore, the change may be detected in this example scenario. For example, the data in a block of the blockchain may be hashed, run through a checksum, or have another integrity code applied. For example, the parties may use a modified blockchain as if it was the earlier, and unmodified, blockchain. Accordingly, the techniques and architectures may improve the operation of the underlying hardware of a computer system because the system may utilize blockchain protocols for storing data for which verifiability is implemented. Additionally or alternatively, blocks may represent a smallest increment of data that may be distributed when an update is made. For example, one or more updated block may be sent separately from the entire blockchain during an update. In addition, the ability of a trusted party to rewrite a blockchain may improve tamper-resistance by providing an established rewrite solution. Accordingly, rather than having to jettison a blockchain due to inappropriate content,

In [13]:
print ("Length of the original text: ",len(text))
print ("Length of the summarized text: ",len(summarize(text)))

Length of the original text:  49195
Length of the summarized text:  10909
